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A SIGNIFICANCE TEST FOR THE HYPOTHESIS 
THAT TWO VARIABLES MEASURE THE SAME TRAIT 
EXCEPT FOR ERRORS OF MEASUREMENT* 


Freperic M. Lorp 
EDUCATIONAL TESTING SERVICE 


The likelihood-ratio significance test is derived for the hypothesis that 
after correction for attenuation two variables have a perfect correlation in 
the population from which the sample is drawn. 


It is frequently desired to determine whether or not two somewhat 
different tests are actually measuring the same thing. Is there really any 
difference, for example, between the abilities measured by a synonyms test 
and the abilities measured by an antonyms test? Or between the abilities 
measured by the “‘college’”’ and the “high school” forms of a test of “‘quanti- 
tative” aptitude? If the same student takes both forms, his two scores may 
be obviously different from each other, partly because the two score scales 
have a different origin and a different unit of measurement, and partly 
because each score contains an error of measurement. The basic question 
to be asked is whether these two tests would have a correlation of 1.00 if 
all errors of measurement were eliminated. The purpose of the present paper 
is to derive and present the likelihood-ratio significance test for the hypothesis 
that such a correlation is 1.00. Although it would be desirable to take the 
effects of sampling test items into account, the derivation here will be con- 
cerned only with the sampling of examinees. 

The procedure for estimating what the value of a correlation coefficient 
would be if all errors of measurement were eliminated was developed by 
Spearman, who called it correcting for attenuation. The basic formula is 


Pry 


(1) Pas Shi. aaa 
V PrzPuy 


where p,, is the population value of the correlation between test X and test 
Y, pzz and p,, are the population values of the reliability coefficients for the 
two tests, and P,, is the population value of the correlation between x and 
y corrected for attenuation, which for the sake of brevity will hereafter be 
called the disattenuated correlation. The assumptions underlying (1) are 
discussed in many texts (e.g., [3], chap. 9). 

If the errors of measurement in x are uncorrelated with y and the errors 

*The writer is indebted to Professor John W. Tukey for his valuable suggestions on 
an earlier draft. 
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of measurement in y are uncorrelated with x, as they are ordinarily assumed 
to be, then the true value of the disattenuated correlation coefficient as 
defined by (1) can never be greater than 1.00 or less than — 1.00. In working 
with actual data, however, sampling fluctuations may cause the observed 
correlation (r,,) between test X and test Y to be larger than the true value 
(p.,), or the observed reliability coefficients (r,, and r,,) to be unduly small. 
It therefore sometimes occurs, when sample values are substituted for popula- 
tion values in (1), that the resulting estimate of the disattenuated correlation 
is numerically larger than 1.00. This unfortunate result of sampling fluctua- 
tions has caused the whole notion of correction for attenuation to be regarded 
with mistrust by many workers. Actually, however, the correction for at- 
tenuation answers a real and important question, and there is no need for 
mistrusting the results if they are properly interpreted. 

The practical problem arises when a disattenuated correlation coefficient 
has been computed from actual data and found to be equal, say, to .90 or 
to .95. Is the obtained disattenuated correlation coefficient consistent with 
the hypothesis that test X and test Y are really measures of the same ability 
or trait, i.e., with the hypothesis that P,, = 1? 

The reliability coefficients to be used in the denominator of (1) may be 
obtained in various ways, as for example by one of the Kuder-Richardson 
methods. The present derivation will be concerned, however, with the case 
where the observed reliabilities are obtained by correlating parallel forms 
of the test. In the case where parallel forms of each test have been administered 
to the experimental group, there are four different observed correlations 
between a form X and a form Y. Various ways of combining these four 
observed correlations to obtain a numerator term for (1) may be readily 
devised. Kelley ((4], eqs. 13:85, 13:91) gives two alternate formulas, and a 
third one will result from the present derivation. 

Large-sample standard error formulas are available for the disattenuated 
correlation coefficient (e.g., [4], pp. 526-529). The standard error, however, 
cannot be used with any assurance to test the hypothesis that P,, = 1 
because it is not clear to what extent the distribution of the sample estimate 
of P,, can be approximated by the normal curve. 

The problem concerns four random variables (scores on the four test 
forms), denoted x, , %2 , yz; , and y,. These variables will be assumed to have 
a normal multivariate distribution with unspecified means and the covariance 
matrix [¢;;], (4, 7 = 1, 2, 3, 4). The two z-variables, and likewise the two 
y-variables, are assumed to be “‘parallel,’’ i.e., 


(2) Ora = Oo2 » O33 = O44 y Ts Cy ~~ Cua = Ce ~ Ce 


In practical work, the assumptions stated in (2) can all be tested simul- 
taneously by means of Votaw’s test for compound symmetry [8] before any 
further calculations are carried out. 
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The null hypothesis, H, , is that P,, = 1; the alternative, H, , is that 
P,, < 1. Under H, , the maximum likelihood estimates, ¢;; , of the unknown 
parameters will be given by (26), (27), (45), and (46). Under H, , the maximum 
likelihood estimates, ¢;; , are given by (48) and by (10) through (14); the 
corresponding maximum likelihood estimate, P,, , of the disattenuated 
correlation is equal to the quantity P,, given by (18) except that if P,, > 1, 
then P,, = 1. 

The likelihood ratio statistic xj for testing the null hypothesis is given 
by (56). The significance test is carried out by looking up x, in a normal 
curve table. The procedure is illustrated by the following numerical examples. 


Numerical Examples 


Table 1 summarizes significance tests of the null hypothesis for several 
sets of hypothetical data. These data were chosen with a view to simplifying 
the calculations, taking advantage of the fact that when the observed statis- 
tics for variables 1 and 2 are equal to the corresponding statistics for variables 
3 and 4, then considerations of symmetry lead to a simple solution for 2 
and 634 , aS given by (47). Thus in Table 1, the observed variances are all 
assumed to be equal (8,; = S22 = 833; = 844), and the observed correlations 
(r;;) are assumed to be such that ri. = 734, %13 = M14 = 123 = To4 . Although 
real data do not ordinarily display such exact relationships, they frequently 
may approximate the data of Table 1. Under these conditions, the maximum 
likelihood estimates under H, of the population correlation coefficients are 
given by 6,; = 1;; ; from (47), the maximum likelihood estimates under 
H, are given by fio = ps, = pis = } (rie + 213). 

Each row of the table is assumed to be based on N = 100 cases. 
The first two values in each row of the table represent the observed data. The 
value in the third column is obtained by the special formula given in the 
preceding paragraph. The fourth column gives the estimate of the dis- 
attenuated correlation derived from (18). The last two columns give the 
likelihood ratio statistic, computed from (56), and the probability that 
as large a value of this statistic would occur by chance under the null hypoth- 
esis. 

The first line of Table 1 shows that if test X and test Y each has a 
reliability of .90, then, for the data illustrated, a sample disattenuated 
correlation of .978 lies at the 23-per cent level under the null hypothesis. 
The third and fifth lines of the table show that when the test reliabilities 
are both .80, a sample disattenuated correlation of .978 lies at the 17-per cent 
significance level, and a sample disattenuated correlation of .95 lies at the 
24-per cent significance level. A comparison of these figures indicates, as 
would be expected, that the lower the test reliability, the more the sample 
disattenuated correlation may be expected to differ from 1.00 solely as the 
result of sampling fluctuations. 
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TABLE 1 vot 
Significance Test for the Hypothesis That the Disattenuated Correlation | firs 
Equals 1.00 for Various Hypothetical Data (N = 100, $41 = S00 = S35 = 6). | the 
are 
Observed correlations = maximum | Maximum likelihood Estimated disat- Likelihood} Signifi- 
likelihood estimates under q) estimates under Ho tenuated correlation] ratio cance 
under Hy statistic level 
Pio * Fay as" hy 
= Pin = Ps, = To = Toy = F135 Bio = Bs, = Fis Py = 13/ VT 12734 % 
-90 .88 886667 978 1.98 025 
-90 .87 .88 -967 2.77 -003 
TI 
80 782222 788148 -978 0.9 17 ve 
80 oT 78 9625 1.54 06 
80 -76 *773333 95 1.97 025 
.80 75 - 766667 +9375 2.50 -005 
80 Th 76 925 2.76 003 
The following sample covariance matrix, computed from the test scores 
of 649 examinees, will provide a numerical example of some practical interest: | 
[86.3979 57.7751 56.8651 58.8986 | | id 
| Ww 
=e 57.7751 86.2632 59.3177 59.6683 
2) es | 
56.8651 59.3177 97.2850 73.8201 | 1, 
| 58.8986 59.6683 73.8201 97.8192_) 








The corresponding sample correlation matrix is 
| 1  .6692 .6203 6407 | 
.6692 1 .6475 .6496 

.6203 6475 1 7567 
| 6407 6496  .7567 La 


ris) = 








The first two variables in these matrices are parallel forms of a 15-item 
vocabulary test administered under such liberal time limits that approxi- 
mately 97 per cent of the examinees completed each form; the last two 
variables are parallel forms of a 75-item vocabulary test constructed so as | 
to be as parallel as possible to the 15-item forms except that the adminis- 
tration time allowed was so short that only about two per cent of the ex- 
aminees completed each form. A more detailed description of the data is 
given in [5]. 

The question here is whether or not the speeded and unspeeded 
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vocabulary tests actually measure the same ability. Assuming only that the 
first two variables are parallel measures and the last two variables likewise, 
the maximum likelihood estimates of the true variances and covariances 
are readily found from (10) through (14) to be 


[86.3305 57.7751 58.6874 58.6874 | 
57.7751 86.3305 58.6874 58.6874 

58.6874 58.6874 97.5521 73.8201 
| 58.6874 58.6874 73.8201 97.5521 


The best estimates of the true values of the intercorrelations among the 
variables under the assumptions just stated are, according to (16) and (17), 


1 .669231 .639505 .639505 | 
669231 1 639505  .639505 
639505 .639505 1 756725 
| .639505 .639505 .756725 Pi 


It is interesting to note that the values of /,. and f3, are, respectively, 
identical to 7;, and r, to four figures, and that /,; is identical to four figures 
with the average value (7-1 >. 4-3 rrz)/4. 

The best estimate of P,, is obtained by substituting 1; for p,, in equation 
1, fiz for p,, , and 3,4 for p,, . The result is P,, = .898643. 














TABLE 2 


Points Lying on the Two Curves 
Representing Equations 50 and 51 











Equation 50 Equation 51 

60 54.2 60 65.5 

50 74.0 70 53.6 

52 70.7 v6) 48.4 

*1,.7 71.169 70.8 52.38 

51.64 71.2627 71.18 51.786 

51.63 71.2855 71.263 51.6511 

51.638 71.2721 71.278 51.6238 

51.639 71.2704 71.271 51.6343 
71.269 51.6390 
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The values of ¢;; were determined from (27), (45), (46) and (26). Equa- 
tions (45) and (46) were solved graphically by finding the intersection of 
the two curves representing the equations, a larger and larger scale being 
used in successive plots of the curves in order to locate the intersection 
accurately. As shown in Table 2, less than ten points had to be computed 
and plotted for each of the two curves in order to find their intersection 
graphically to five figures. The best estimate of the covariance matrix under 
the null hypothesis was thus found to be 


[86.330 51.639 60.665 60.665 | 
51.639 86.330 60.665 60.665 
60.665 60.665 97.552 71.270 | 
| 60.665 60.665 71.270 97.552 


(6; ;] - 





The estimates of the correlations between the variables under the null 
hypothesis are 


1 59815 .66106 .66106 | 
59815 1 66106 .66106 
66106 .66106 1 73058 | 
| .66106 .66106 .73058 1 | 


[aii] = 








The value of x{ found from (56) is 35.30, and the corresponding value 
of x, is 5.94. Since x; has a normal distribution, the value of 5.94 is obviously 
highly significant, and it may be concluded that the speeded vocabulary 
tests measure an ability somewhat different from the unspeeded tests. 

In order to illustrate the use of the normal curve tables, the foregoing 
numerical example will be completed, finally, as it would have been if the 
number of cases had been 101. In this case, 

x2 = 1% (35.30) = 5.4475 and x, = 2.33. 
648 

This is at the one per cent level, since one per cent of the area of a normal 
curve lies beyond +2.33. 


A Solution for Maximum Likelihood Estimators 


Since (z) the normal multivariate distribution is determined by its first 
two moments, (77) the second moments of a sample from a normal distri- 
bution are distributed independently of the first moments, (27) the problem 
here is concerned with second moments and not with first moments, it follows 
that we may restrict attention to the sampling distribution of second moments, 


i.e., to the Wishart distribution (e.g., [2], pp. 403-406). For present purposes 





=— 
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this may be written 


4 at 
(3) Lo ses 7°” » O44) =K | o ear exp (4 2 » be a's), 


t=1 j;=1 





where K is an expression that does not involve the o;; , N is the number of 
examinees, | a | is the determinant of [o,;], o*’ is an element of the inverse 
matrix [o*’] = [o;;]"’, and s;; is the unbiased sample estimator obtained by 
multiplying the usual sample variance or covariance by N/(N — 1). It is 
implicit in (3) that | o | ¥ 0. 

It is convenient to rewrite the restrictions in (2) in terms of the o° 
that they become 


i 


, SO 


11 22 33 44 13 23 
(4) CFS .4.08 =o, gc =o =o =o. 


There is implicit in H, a further restriction not ordinarily encountered in 
the usual multivariate tests of significance. This restriction is that P., < 1. 
Since by (1), 





(5) Ozy/O,0y Try 


Pry - 2 i; / ’ 
V (612/02)(o34/0;) 912034 


this restriction may be restated as 
(6) he < 12034 - 


It should be noted that this inequality is not implied by the Gramian character 
of the covariance matrix. This inequality reflects the fact that x, and x, are 
known to contain errors of measurement that are uncorrelated with y; and 
y, and that y; and y, are known to contain errors of measurement uncorre- 
lated with x, and x, . These errors of measurement impose an upper limit on 
the correlation between variables x and y. 

A restriction involving an inequality introduces certain difficulties into 
the maximization problem. It will be convenient first to carry through the 
maximization while ignoring the inequality. The results obtained when the 
inequality is taken into consideration will be worked out in a later section. 

The quantity to be differentiated may be written 


(7) Q = —log|o| — Dd) dio''s;. 





It will be convenient to differentiate partially with respect to the o*’ rather 
than with respect to the o;; . 7 
The partial derivative of log | o | with respect to o°’ is 


(8) ola e! = —(2 ae 5,04; ’ 


ao"! 


where 6;; = 1 if ¢ = j, and 6,;; = Oif ¢ # j. 
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The partial derivatives of (7) are to be found under restrictions (4). A 
convenient formula states that the derivative of any function f(z, y) with 
respect to x under the restriction that x = y is equal to 


| aie vy), af(e, | 


Ox oy 





With the help of this formula, it is readily found that 
te) 

(9) 29, = 2011 — S11 — S22. 
a 


If (9) is placed equal to zero and a “hat” (~) placed over the unobserved 
quantity, it is found that 


(10) 611(= G22) = 3(Si1 + S02). 
Similarly, 

(11) Gsa(= Gas) = 3(S33 + Sua), 
(12) Gio = Sin, 

(13) G34 = S34, 

(14) ¢= 5S, 


where ¢ = 613 = 614 = 603 = 62, and S = (813 + 814 + 823 + 8o4)/4. Equations 
(10) through (14) express the maximum likelihood estimates of the unknown 
population variances and covariances in terms of the observed sample 
variances and covariances. 

The population value of the disattenuated correlation is given by (1). 
Since the maximum likelihood estimate of a function of parameters is equal 
to the same function of the maximum likelihood estimates of the parameters, 
the maximum likelihood estimate of P,, is 











(15) P., ro pis/ WV dbo 
Since 
j 2s 
16 eee Fi2 = 12 
( ) ss Gi S11 + S22’ 
(17) a ae + S24 
V 611633 2V (sii + 822)(833 + S44) 


it is readily found that 


(18) Pp = 8i3 + 823 + 84 + S24 
; 4812834 
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Formula (18) is different from those in the literature with which the writer 
is familiar, but the difference is negligible when JN is large. 


Estimation Under the Null Hypothesis That P = 1 


The next step is to use the same approach to derive maximum likelihood 
estimates under the restrictive hypothesis that P,, = 1. An equivalent 
statement of this hypothesis, as indicated previously, is 


(19) o = 012034 + 


If Lagrange multipliers » and » are used, the quantity to be differentiated 
may now be written 


q = —log | o | = .  ® ots,, 4 nite” va o*) rs u,(o™ Lom ey 


(20) 
+ 2 a » vrs(ors — 012034), 


with J = 1,2and J = 3, 4. 
It can be shown that 





(21) 90 = (36;; — 1)(o,:01; + TniFg;)- 


do"? 


The necessary derivatives of (20) can now be written down. First, 





0 
3 e = on — 8:1 tue +2 be z Vry(—2orsonos 
(22) o I J 
H+ 042013014 + 034011012) } 
dq 
aon" = O29 — Soe — Mr + 2 x in vry(—2opsono se 
(23) o ae: 


+ 012023024 + 34012022) 


Now all values of o;, are equal, and may be denoted simply by the 
symbol o. Setting (22) and (23) each equal to zero, adding them together, 
supplying the tilde symbol to designate the maximum likelihood estimates 
obtained under the hypothesis that P = 1, and replacing ¢. by the equal 
quantity ¢,, , 


0 = 24, - (Si: + S22) + 2 pM ph v1y(— 26°61 
(24) Fi J 
= 26° 612 oe 26° 615 + 2634611619); 


or ° 


(25) 0 = 26, — (Si + 822) + 4 > x Vrs(FssF12 — F)G11 . 
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Because of (19), 


(26) GF = GioGa4 
The desired maximum likelihood estimator is, therefore, 
(27) éu(= én) = Hu + 8) = bu. 
A similar equation holds for ¢33 = &44 . 

Next, 


0 
3 i = 20¢n — 28en + 2 > » vr3(—2ors(oe10Hs + OGson1) 
a 





(28) 


+ O12(6G30Hs + 7 G47n3) + O34(OG10H2 i oG20m)], 
(G = 1,2; H = 3, 4). 
Placing (28) equal to zero and summing on G and H yields 


0 = 4¢ -— 48 + o >» >. Ors . > (—26¢16n5 — 26° 
(29) I J G H 


+ F120 n4 + F120 43 +> 6340 @1 + F340 G2)- 


If > oe >.» e1 Gx, is written out in full, it is seen that, because of its symmetry, 
it will remain unaltered irrespective of the values assumed by J and J. If 
all terms of >¢).y are written out and collected, it is found that 


= 2¢ — 28 — «>, & VraNGiG33 — F123 
(30) rs 


— 611634 — 3612034 + 46°). 
Using (26) and factoring gives the result 
(31) 2(¢ — §) = s02 DoE — G12)(G33 — G34). 
Before doing further work with (31), it will be helpful to work with 


oO , 
(32) ri = Qo. — 22 + 2 > > ¥rg|—2ors(o11025 + 015021) 


+ 12(013024 + 014023) + O34(011022 + o32)]- 
Placing (32) equal to zero, it is found that 


(33) 0= Gi2 — Si2 + (X > v13)(— 26,6" + 34611 + 3452). 


Using (26) and factoring, 
(34) Fi2 — 82 = “z > vra)(Gi1 = G12) Osa . 





~ as ~~ Tr a ee ee, 


Tr ~~ 


~~, 


rT 
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By symmetry, 
(35) G34 — S34 = —(h dX Vrs)(G33 — G34) G12 . 

Equations (26), (31), (34), and (35) remain to be solved, the four un- 
knowns being ¢, é2 , 3, , and the quantity >>,>., :, . As a first step, this 


last quantity will be eliminated from (31), (34), and (35). Multiply together 
(34) and (35) and use (26) to obtain 


(36) (G12 ae S12) (G34 = 834) =e (x > vrs) (Fi ae G12) "(G33 ae G34) 6°. 
Square (31), subtract from (36), and substitute ¢;; for s;; to get 
(37) (G12 — Gi2)(Gs4 — Gas) = 46” — 866 + 46’. 


Multiply (34) by é12 (33 — 34), multiply (31) by ¢(¢,, — é,.), and add the 
results to obtain 


(38) 2¢é — dé(Gn — G12) + (G12 — G12)G12(G33 — G34) = O. 
Similarly, 
(39) 2(¢ — 6)G(Gs3 — Gas) + (Gsa — G34)Gsa(G11 — G12) = O. 


The next step is to eliminate ¢ from (37), (38), and (39). 
Multiply (37) by (4:1 — 12), add 4 times (38) and use (26), obtaining 


(40) 4612 — G12) G12(G33 — G4) 

= (611 — G12)[46” — (G12 — 612)(Gsa — bss) — 4612634]. 
Similarly, 
A(Gs4 — O34)Gs4(G11 — Gio) 


— (G53 ‘ied G54) [46° ‘sg (G15 a 612) (G34 a G34) aad 4610634]. 


(41) 


Two possibly simpler equations may be derived from (40) and (41), as 
follows. Multiply (40) by (¢33 — @&34) and (41) by (6; — 6&2) and subtract 
one from the other to obtain 


(42) G12(G12 te 612) (ss ‘5 G4)” = Gsa(Gas Sari 6s4)(G11 sail G12)”. 


Multiply (42) by ¢12 (¢12 — 612), extract the square root of both sides of the 
equation, multiply by 4, and subtract from (40). If (6; — 2) ¥ 0, as will 
ordinarily be the case, the resulting equation may be divided by this quantity 
to produce 


4G 126 34 + (Fie ini 12) (Fae _* 634) 


+ AV Gi2634(G12 — b12)(G34 — és.) = 4¢?, 


(48) 
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Take the square root of (43) to obtain, finally, 
(44) 2V G 12634 = V (G12 oa G12) (Gs4 ea G34) = +26. 


No simple explicit solution for ¢,. and ¢3, seems to be obtainable. Equa- 
tions (42) and (44) may, of course, be solved by iterative processes. In the 
writer’s experience, however, the iterations have converged slowly. Since 
(40) and (41) are each linear in one of the unknowns, it is probably easier 
to obtain the solution graphically from the intersection of the two curves 
representing these equations. If (40) and (41) are solved for 3, and 6,2 , 
respectively, the resulting equations, from which the curves may be plotted, 
are 





(45) Hes i 4Be" -{- ABE 35 gh 4AG10633 , 


AB + 4¢,,(B — A) 


(46) > a 4Dé° + CDs.- ais 4C 634611 
: — CD + 4634(D =~ C) ; 








where A = 6,2. — 6,2, B = 6; — G2, C = G4 — O34, D = Gag — Gy 

In the special case where s,;, = 833 , S22 = S44, Si2 = S34, Sig = Seq, and 
Si4 = So3 , Symmetry requires that ¢,. = ¢3, . In this case, (44) is readily 
solved, the solution being 


(47) Gin = 639 = FC = (812 + 8&3 + S93). 


This result suggests that whenever the actual data approximate the fore- 
going special case, (2s,. + 45S)/6 and (2s,, + 45)/6 can be used as con- 
venient first approximations to ¢,. and ¢3, , respectively. 

When the values of ¢,. and ¢,, have been determined, the value of ¢ 
may then be obtained from (26). 


Maximum Likelihood Estimation Under H, 


As previously noted (6), the logic of the mathematical model appropriate 
for the present problem imposes for H, the restriction that P,, < 1 or, 
equivalently, that 01; < 01203, . The values of ¢;; represented by (10) through 
(14) were obtained without attention to this restriction. These values provide 
maximum likelihood estimators of o;; under H, only so long as they do not 
violate (6). If the symbols ¢,;; are used to denote those values of o;; consistent 
with (6) for which (3) is maximized, then it can be shown that 
(48) igs Oi; 5 if these values satisfy (6), 


Oi; ; otherwise. 


The o;; are the maximum likelihood estimates of the o;; under H, . Whenever 
the ¢;; violate (6), the maximum likelihood estimates under H, and under 
H, are identical. 





an 
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The Likelihood Ratio Test 
The likelihood ratio for testing the hypothesis that P = 1 is ([6], p. 257) 


= Le » Fi2 jig » Fas) 
Len ’ G12 miata » 544) 





(49) N 


This ratio will equal 1 whenever P,, > 1. Because H, involves an inequality, 
the usual theorems on the large-sample distributions of likelihood ratios 
cannot be applied blindly. Application of a recent result obtained by Chernoff 
[1], however, shows that in large samples 


Prob (—2 InaA < 0) = 0, 
(50) Prob (—2 In \ = 0) = 3, 
Prob (—2 Ind < x3) = 4 + 4F (x3), 


where In ) is the natural logarithm of \, and F(x;) is the cumulative distri- 
bution function of a quantity distributed as chi square with one degree of 


freedom. 
It can be shown that 


(51) > > é's;; = 4, 
and that 

(52) Db é's,; = 4. 
From (49), (3), (51), and (52), 


9 ee 


| | 
The large-sample significance test is carried out by computing 
—2In\ 
(N — 1)(In| | — In|{e¢}) 
= 2.3026(N — 1)(logio | ¢| — logio | a |). 


Il 


2 
X1 


(54) 


Convenient formulas for computing the value of the determinants in 
(54) are readily found: 


| o | = (1; - G12)(G33 = G34) (G11 + G12)(G33 + G34) 4673] 
- 11033(1 es pi2)(1 “ont Psa) [(1 + Pio)(1 + Psa) in 4pis], 


where fi3 = 613/W641633) P12 = O12/611 , ete. The formula for | ¢ | is obtained 
simply by switching diacritical marks in (55). 


(55) 
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It is worth noting that the ratio | ¢ |/| « | does not involve any variances, 
being expressible solely in terms of correlation coefficients. Thus (54) may be 
rewritten 


(1 = fr2)(1 — fsa) [(1 + Ar2)(1 + Ass) — 46%) 
(1 ea Pi2)(1 oa pss) [(1 + Pi2)(1 + Psa) ia 4633] 


Although the significance test can be written solely in terms of the p;; and 
p;; , a8 in (56), it can not in general be written solely in terms of the usual 
sample product moment correlations, r;; . 

With the aid of (50), the computed value of xj may be looked up in a 
table for chi square with 1 degree of freedom. Since x, is strictly normally 
distribute ({7], p. 408), however, it will be more convenient to work with 
x: instead of xj and to use the normal curve table instead of the chi square 
table. Only the positive tail of the normal distribution should be used to 
obtain the significance level. Thus x, = 1.64 is at the five per cent level, 
x: = 1.96 is at the 23 per cent level, and x, = 2.33 is at the one per cent level. 

If for any given set of data the value of P,, is found greatly to exceed 
1.00, the experimenter should reconsider the assumptions underlying the 
foregoing significance test. Too large a value of P,, may be due to lack of 
independence in the errors of measurement of the test scores, or to lack of 
parallelism between supposedly parallel tests. Lack of parallelism can be 
tested by use of Votaw’s test [8], as already noted. Independence of the 
errors of measurement must be guaranteed in advance by careful experi- 
mental design. 





(56) xi =2.3026(N — 1) logio 
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MARKOV PROCESSES IN LEARNING THEORY 


JOHN G. KEMENY AND J. LAURIE SNELL* 
DARTMOUTH COLLEGE 


Consideration is given mathematical problems arising in two learning 
theories—one developed by Bush and Mosteller, the other developed by 
Estes. The theory of Bush and Mosteller leads to a class of Markov processes 
which have been studied in considerable detail (see [1] and [7]). The Estes 
model can be treated as a Markov chain, i.e., a Markov process with a finite 
number of states. For an important class of special cases, it is shown that the 
Bush-Mosteller model is, in a sense, a limiting form of the Estes model. The 
a probability distributions are derived for the cases treated in both 
models. 


1. Summary of Results 


The Markov chains discussed in the early sections arise from a study of 
a learning model proposed by Estes [3]. Only a special case which leads to 
the Markov chains will be discussed in this paper. 

In a sequence of experiments, a subject is to give one of two responses, 
R, or R, . After his response, the experimenter makes one of two possible 
reinforcing actions A, or A, . It is assumed that the choice of the subject is 
determined (in a probabilistic sense) by a set of n stimulus elements. Each 
of these stimulus elements at the beginning of an individual experiment 
is connected to one of the two possible responses. These connections change 
as the experiments proceed. 

Before making his choice, the subject either samples or does not sample 
each of these stimulus elements. It is assumed that the sampling gives a set 
of n independent trials with probability 6 that a particular stimulus element 
be sampled (in the more general model this probability depends upon the 
stimulus element). It is assumed that the probability that the subject makes 
response R, on a given experiment is equal to the proportion of elements 
connected to R, in the set which he samples. 

The choice of the experimenter is assumed to change all the connections 
of the stimulus elements of the set sampled to agree with the choice. For 
example, if the experimenter does A, then all elements which were sampled 
and which were connected to R, become connected to R; . All other elements 
are unchanged. 

To specify completely the process, it is necessary to give the method 
used by the experimenter to generate his A’s. Various different possibilities 
lead to different processes. In this paper consideration is restricted to the 

. *This research was ee by the National Science Foundation through a grant 
given to the Dartmouth Mathematics Project. 
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case where the experimenter chooses A, with probability p and A, with 
probability (1 — p), independent of the choice of the subject. In this case, 
a Markov chain is obtained by taking as a state the number of stimulus 
elements connected to R, at the beginning of a single experiment. This is the 
Markov chain given in section 2. 

In section 3, a method for calculating the limiting distribution for these 
Markov chains is given. In section 4 it is shown that these distributions also 
arise from a very simple probabilistic process. In many applications of the 
Estes theory the learning parameter @ is believed to be small. In section 5 
it is shown that for small values of 6 the limiting distribution may be approx- 
imated by the binomial distribution with parameter p. 

Once the limiting distribution for the above Markov chain is known, 
it is possible to find the limiting probability for a response of each type by 
the subject. It is then possible to predict from the model how the subject 
will behave, in terms of how the experimenter proceeds. For example, in the 
case studied in this paper, the model predicts that, if the subject chooses 
response A, with probability p, the limiting probability that the subject 
will choose R, is p. In other words, the subject will tend to match the behavior 
of the experimenter. Experiments have verified that this is generally true. 
For a more detailed discussion of these results the reader is referred to 
Kemeny, Snell, and Thompson [8]. 

In section 6 it is shown that, for fixed value of 6, as the number of stimulus 
elements tends to infinity the limiting distributions tend to a distribution 
F which depends only on @. In section 7 it is shown that distribution F is 
the limiting distribution for a Markov process of the Bush-Mosteller type. 
This result connects the two theories and suggests that the Bush-Mosteller 
model may be thought of as a limiting case of the Estes model. 

In sections 8 through 11 the distribution F is studied. It is shown that 
its character depends essentially on the value of 6. As in the case of the 
limiting distribution for the Estes model, it is possible to give a very simple 
probability process which leads to the distribution F. However, the function 
F itself is extremely complicated for most values of 6. 

It is interesting to note that some famous pathological distributions, 
constructed by mathematicians as counterintuitive examples, actually occur 
in a practical application. For example, the Cantor set can be defined as 
the set of all numbers on the unit interval whose expansion in the number 
system to the base 3 does not contain the digit 1. For one value of 6 the distri- 
bution F is concentrated on this strange set. 


2. A Class of Markov Chains 


In this section the class of Markov chains that arise in the Estes learning 
model is discussed. These chains can be described in terms of drawing balls 
out of urns. 





——— —— ae 
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Two urns, A and B, contain altogether n balls. An urn is chosen, and 
each ball in that urn is removed with some fixed probability and placed in 
the other urn. Assume that the probability that urn A is chosen is 1 — p, 
and that urn B is chosen is p. Assume that each ball is chosen with probability 
6, and that the choices are independent of one another. 

Form a Markov chain by taking as states the number of balls in urn A. 
A single step then is the result of the transfer of a certain number of balls 
from one urn to the other. The transition probabilities are: 


Diitk = »(" k Vera — oi", O©<kin—j, 


paw 0-a()ea- ort O<k<o, 


Di3 = PA— o*'+(1— pil — 6’. 


For each choice of n and 6 the resulting Markov chain will have a limit- 
ing distribution. Denote the limiting probability of being in state 7 by q5(@) 
or simply by gq; in any discussion where @ is fixed. 


3. A Recursion Relation for the Limiting Probabilities 


In this section a recursion relation is obtained which enables one to 
calculate the limiting probabilities qj(@) for a given n from the knowledge of 
these probabilities for smaller values of n. 

In order to do this, consider a slightly different description of the Markov 
chain. Suppose the order is reversed: first draw out each of the balls from 
each of the urns with probability 6 for each ball; then take the subset of 
balls obtained and with probability p put them all in urn A and with prob- 
ability 1 — p put them in urn B. Then the same Markov chain described in 
section 1 is obtained. 

With this description, consider the limiting probability that there are 7 
balls in urn A. Obtain this probability by finding first the probability that 
k < j balls are chosen and put into urn A; of the n — k not chosen j — k 
were in urn A. Secondly the probability that k < n — j are chosen and put 
into urn B, and that of the n — k not chosen j were in urn A. This leads to 
the following recursion relations: 


n-i 


(2) a; =p » ("ora — ogi + (1 — p) > ("ora — ortgn, 


g = 1. 


4. An Auxiliary Process 


As is often the case with Markov chains of this kind, there is a rather 
simple way to describe the limiting distribution. 
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Consider the n balls outside the two urns. Choose each ball with prob- 
ability 6 and put the subset obtained into urn A with probability p, and 
into urn B with probability 1 — p. Repeat this procedure on the set of balls 
which were not chosen the first time. Continue this process until all the balls 
are put into the urns. Then the limiting probability q; for our Markov chain 
is the probability that this new process puts j balls in urn A. The proof of 
this consists in observing that these probabilities also satisfy the set of 
recursion relations (2). This process will be referred to as the auxiliary 
process. 

The recursion relation (2) does not seem to have a simple solution. 
However, the situation can be simplified somewhat as follows. Let s, = q? . 
That is, s, is the limiting probability that all the balls are in urn A. Then 
from (2) one obtains the following recursion relation for s, : 


n-1 
p p> (maa m OP *™, 


& = 1. 


(3) s,[1 — (1 — 4)"] 


ll 


All of the other limiting probabilities can be obtained from the knowledge 
of the s, . In the auxiliary process, number the balls. Let A; be the event 
that the jth ball ends up in urn A. Then 


P[A;]=4, 
and in general the probability that some particular r of the events occur is 


s, . Moreover q° is the probability that exactly j of these events occur. Then 
(see Feller [4], p. 64) 


,_([n\S n—j 
. a= () Keren. 
j/ k=0 
5. Some Limiting Cases 
If in (3) 6 approaches zero, the recursion relation is 
8. = P-1, (n> 1) 


& = 1. 


The solution of this is s, = p”. From (4), the limit of g} as @ > 0 is the 
binomial distribution with parameter p. That is 


tim a3) = (")p'ca = py 
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This is intuitively clear, since in our auxiliary process, 6 means essentially 
putting the balls in the urns one at a time. If they were put in one at a time, 
the binomial distribution would be o®tained. 

If in (3) @ approaches one, then the recursion equations become 


S, = PSo ; 
Ss = 1. 
From (4) 
DP, (j = n) 
lim q5(@) = ) 0, O0O<j<n) 
6-1 
1-p, (G=9). 
This again is intuitively clear, since, for # near one in this auxiliary process, 
it is almost certain that all the balls will be chosen at once, and then it is just a 
matter of which urn they are placed in. 
Finally consider the case 9 = 1/2, p = 1/2. In this case g} = 1/(n + 1) 


for all j, a uniform.distribution. This case, unlike the last two, does not seem 
to be suggested by intuition. 


6. The Limiting Distribution for Large n 


Referring again to the auxiliary process, let X; be a random variable 
which is 1 if the jth ball is put in urn A and 0 otherwise. Let 


S, = (X, + X. + +++ + X,)/n. 


The distribution of S, , then, is the limiting distribution”for our Markov 
chain normalized by dividing by n. The distribution of S, is a discrete distri- 
bution on the unit interval which puts mass gq} at the point j/n. 
The jth moment of S, is, by the multinomial expansion, 
i! 
«=> 2 a aie --- ot: 


fFatrat***+rna=i 1 {72! ci r,! 
Here E{xj*x3" +--+ 25"} = s, , where k is the number of r; which are not zero. 
Thus, rewrite the sum in the form 
, i i 
E{Si} = SLas,, where >a, = 1. 


k=1 k=1 


If k > n, then a, = 0. In the case k = j < n, allr; are 0 or 1. The coefficient 
of the term s; in this case is 


a; = [nn —1)-:-@—jt+ Dm", Gsn). 


Letting n tend to infinity, for fixed j, a; tends to 1 and, since the sumof,the 
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a, is one, the remaining terms must tend to 0. Thus 


lim E{(S,)7} = s; . 
Thus the random variables S,, converge in distribution to a random variable 
S whose moments are given by s , 8; , --: . Denote the distribution of S 


by F(A). 
7. The Bush-Mosteller Markov Process 


In the last section it was shown that, when n approaches infinity, the 
distribution determined by the weights g} tends to a limiting distribution 
F(A) on the unit interval. It will now be shown that F(A) is the limiting distri- 
bution of the Markov process defined by 


” ie (1 — 6d)x+ 0 
1% 86(1— @)z. 


This is a process which moves on the unit interval so that if at any time it is 
at x, it moves on the next step to (1 — @)x + 8, with probability p, and to 
(1 — 6)x with probability 1 — p. This is a special case of the Bush-Mosteller 
learning process. 

If this process starts at x, then the possible positions after n steps can 
be written as 


(1 - Oe + 274,61 - 9’, 
My fut 
where a; is 1 or 0. The probability of being at any particular such point is 
p'(1 — p)”’, where r is the number of the a; which are 1. 

Thus the distribution of the position of the process after n steps is the 
same as that of the random variable 


8. =(1- Oe + Yel — HO, 


where ¢; , 7 = 1, 2, --+ , are independent random variables such that ¢; = 1 
or 0 with probability p and 1 — p, respectively. Thus any question involving 
the distribution of the position can be answered by studying the sequence 
{S,},m = 1,2, ---. 

It is clear that the distribution of S, approaches a limiting distribution 
independent of the initial point z, and this distribution is in fact the dis- 
tribution of the infinite sum 


eo 


S= >} 601 — 0)*". 


j=1 








—e- bh 


eS Fa 


ae © = 3 
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To show that this limiting distribution is the same as F(A) found in 
section 6, it is sufficient to show that the moments of the distribution of S 
are the same as those of F(A). 

Let F,, be the distribution of S, . Then 


E{(S,+:)' | S, = x} = p[(l — Oa + 6)' + (1 — p)[(l — @zy’. 
Thus 


El(Seos)'} = [pl — x + 6} dF, + / (1 — p)[( — Oa]! dF, . 


e 


Passing to the limit, 


Il 


BtS'} =p Y (Ja — ote*B(s4} + 0 — Ha — 9'BIS', 


E{S*} = 1. 


This recursion relation is the same as (3), found for the values s; . Thus 
the jth moment of the limiting distribution is the same as that of F(A), 
namely s; . Therefore the limiting distribution must itself be F(A). 

If 6 tends to 0, then E(S’) approaches p’, as was shown in section 5. 
Thus the limiting distribution will be concentrated at the point p. 


8. The Distribution of S for the Case 0 > 1/2 


Since S is independent of the initial position, assume that the initial 
position is 0. Then 


n 


S,= ¢@1—0'", M21) 


i=1 
So = 0. 


Assume that the first of the e; are known, i.e., that the position of S, , call 
it z, is known. What then are the possibilities for S? If ¢,,, = 1, then 
S >2a-+ 6(1 — 6)”. On the other hand, if ¢«,., = 0, then the most that S 
could be is found by making all future e;’s equal to 1. That is 


S<2t+t Vo-o 


=z+(1-—0)"*’. 


Thus, if S, = 2, then it is impossible that S be in the open interval 
[x + (1 — 6)"**, + 0(1 — 6)". The intervals obtained in this way by 
considering all possible values for S, , for all possible n, are disjoint and have 
total length 1. The value of S cannot lie in these intervals. The possible 
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values for S are all x such that 
lead 2 a;0(1 — 6)’, 
7=1 


where a; is 0 or 1. The set of all such z is a set of measure zero. In the case 
6 = 2/3, this set is the Cantor set. 
The function F(A) is constant on the interior of all the intervals described 


above which cannot contain S. Moreover, if z = )>"_, a,0(1 — 6)* "isa 
possible position for S, , then S will lie in the interval [z, z + (1 — 6)"] if 
and only if e¢; = a; for 7 = 1, --- , n. This will occur with proba- 


bility p’(1 — p)”’’, where r is the number of the a; which are 1. Thus 
F(z + (1 — =") -—F@=p(l—p’", @2)). 


It is clear that this determines F completely and in fact that F is a 
continuous function but not absolutely continuous. For a detailed discussion 
of this function F for the case 6 = 2/3, see Hille and Tamarkin [5]. 

It is interesting to observe the following fact for the case @ > 1/2. Let 
E be the set of possible values for S. If the process begins in E, then it never 
enters the complement of HZ. On the other hand if it starts in the complement 
of £, it never enters /. But as n increases the position approaches LF, and the 
limiting distribution is independent of the starting position. 


9. The Case 6 = 1/2 


In this case the possible values of S are all real numbers on the unit 
a (0, 1]. This is the case since the possible values of S are numbers 
= )-a,/2', where a; is 0 or 1. But such an z is the number on the unit 
aac whose diadic representation is .a,a,a; --- . 
Let [a, b] be an interval with 


a = .a,a, --- a,0, 
b = .a,a, ++: a, 1, 
in diadic representation. Then 
F(b) — F(a) = p'(1 — p)""," 


where r is the number of the a; which are 1. 

In the case p = 1/2, 1/2” is the difference of F on such an interval. 
This determines F(A) as the uniform distribution, i.e., F(A) = 

In the case p ¥ 1/2, it will be shown that F(A) is not absolutely con- 
tinuous. Let F be the set of all x between 0 and 1 which, when expressed in 
diadic form, have the property that the proportion of 1’s in the first n digits 
approaches p. Then, by the law of large numbers, the measure determined 
by F(A) assigns measure one to the set #. On the other hand for p = 1/2 the 
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limiting distribution determines the ordinary Lebesgue measure. By the law 
of large numbers, this measure assigns measure one to the set of points whose 
diadic representation is such that the proportion of 1’s in the first n digits 
approaches 1/2. Hence, R has Lebesgue measure 0. Thus the measure de- 
termined by F(A) assigns measure one to a set of Lebesgue measure 0. Thus 
F(A) is not absolutely continuous. 


10. The Distribution of S for 6 < 1/2, p = 1/2 


A sequence of values for 6 will be given which approach one and such that 
the limiting distribution for these values is absolutely continuous. It was 
shown by Jessen and Wintner [6] that the distribution of S in this case is 
either absolutely continuous or purely singular. Erdés [2] on the other hand 
showed that for a certain class of values of 6 the distribution is purely singular. 
It is not known whether this class includes values arbitrarily near 0, but it 
does include values less than 1/2. One such choice for 6 occurs when 1 — @ 
is the “Fibonacci” value 4 (~/5 — 1). 

The sequence of values for which the limiting distribution is absolutely 
continuous consists of the sequence 0, = 1/~+/2, n = 1, 2, ---. For such a 
value of 0 


ao 


S= Ye(1 — 1/V2)(1/v/2)'". 
i=1 
This sum can be written as the sum of n subsequences, each of which is a 
constant times a series of the form : el e;/2’, where ¢; is 0 or 1 with prob- 
ability 1/2. Thus 


S= Le a;X; , 
i=l 
where the X; , j = 1, «++ , m, are independent random variables with a 
uniform distribution on the interval [0, 1]. The actual values of a; are given 
by 
a, = 21 — 1/V/2)2°-", 

forj = 1,2, ---,n. 

The function F,, can be found, following the method of Uspensky 
({9], p. 277); it is piecewise a polynomial function. 


11. The Characteristic Function of S 


Since S is the infinite sum of independent random variables, the character- 
istic function of S, i.e., Fourier transform of F(A), is the product of the 
characteristic functions of the summands. Thus 


E{e''5} al Il (1 7. p) i‘ — © 


j@1 








230 PSYCHOMETRIKA 


In the case p = 1/2, this can be simplified to 
exp (it/2) IT cos [(1/2)t0(1 — 6)'~*]. 
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NOTE ON THE LEAST SQUARES SOLUTION FOR THE 
METHOD OF SUCCESSIVE CATEGORIES* . 


R. DARRELL Bock 
UNIVERSITY OF CHICAGO 


The problem of estimation in the method of successive categories is 
reconsidered and a new least squares solution is obtained. An empirical com- 
parison of this solution with Gulliksen’s solution is presented. 


Gulliksen [8] has proposed a least squares solution for the method of 
successive categories which, he suggests, is formally equivalent to Horst’s 
[10] solution for the matrix of incomplete data. Because of the way in which 
the terms to be minimized by least squares are defined, there appears to be 
an important difference between the two solutions. If the purpose of the 
method is to estimate location and scale parameters for the distributions of 
preferences of the objects being rated, then Horst’s definition is perhaps the 
more appropriate. 

In Horst’s formulation the observation matrix consists of quantitative 
ratings assigned to a number of objects by a number of raters. These ratings 
enter into the solution without grouping or other transformation. The aim 
of the solution is to characterize each object by some single score and to 
adjust the location and scale of the ratings of each rater so that the sum of 
squared discrepancies between the single scores and the adjusted ratings is 
minimal. If there are p raters and q objects, this sum ranges over pq terms. 

The original data to which Gulliksen’s solution applies are also the 
ratings of a number of objects by a number of raters, but the ratings are 
made in coarse, successive categories and are grouped without retaining 
the identities of the individual raters. On the assumption of normally dis- 
tributed affective values underlying the coarsely grouped ratings, the pro- 
portions of ratings for each object falling in the various categories are 
transformed to normal deviates. If there are r categories, the resulting r X q 
table of deviates is considered the observation matrix. The problem is then 
to adjust the location and scale of deviates in the columns of this table so as 
to minimize the sum of squared discrepancies between the adjusted deviates 
and some single value assigned to the respective category boundary. Since 
the extreme boundaries are indeterminate, this sum ranges over g(r — 1) 
terms. 

*Preparation of this paper has been supported by the Quartermaster Food and Con- 
tainer Institute for the Armed Forces. Views or conclusions expressed herein are those of 


the author and do not necessarily reflect the views or indorsement of the Department of 
Defense. 
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Thus, Gulliksen’s solution does not take the original ratings as obser- 
vations nor does it seek a transformation of the category boundaries which 
minimizes the discrepancies between the p ratings received by the objects 
and the location parameters which characterize their mean affective values. 
Instead, it treats the normal deviates as observations and seeks a single 
value for each category boundary which minimizes squared discrepancies 
of deviates after they have been adjusted for location and scale. As a result, 
the outlying boundaries, which usually bracket only a few ratings, are 
disproportionately represented in the solution. It may be possible to correct 
this difficulty by some weighting scheme, but it should not be supposed 
that the Miiller-Urban weights are appropriate for this purpose. The justifi- 
cation for the Miiller-Urban weights is that they correct asymptotically 
for the different variances of normal deviates corresponding to different 
observed proportions. They are derived on the assumption that the re- 
spective proportions are statistically independent, as they are in the constant 
method [7] or in the usual toxicological experiments [5]. Deviates based on 
cumulative frequencies are not independent, and a weighted solution would 
have to incorporate both their variances and covariances. 

An alternative successive intervals solution, which minimizes an error 
term strictly analogous to Horst’s, can be derived in the following way: (7) 
Carry the normal deviates to the centroidal point of the interval correspond- 
ing to each category, and let the values of these deviates represent estimates 
of the affective values of the ratings in the respective category. (77) Multiply 
each deviate by the number of ratings in each category so as to reproduce, 
in effect, the sum of affective values for all raters. (777) Determine the single 
values for the centroidal points of each of the categories, and the location 
and scale constants for each distribution which minimize the sum of squared 
discrepancies between the reproduced ratings in all categories and the re- 
spective single values which are to represent them. The resulting sum will 
range over pq terms. The centroidal points can be determined by Pearson’s 
formula for the mean point of an interval under the normal curve. Since 
normality must be assumed in any case, this does not introduce an additional 
assumption. 

The solution which results from this formulation, in addition to 
minimizing what is perhaps a more appropriate error term, is attractive in 
several other respects. It yields directly scale values which may be used to 
quantify the ratings for further statistical operations, e.g., for correlating 
ratings between raters or objects, for analysis of variance, etc. Zero entries 
in the frequency table cause no difficulty. The derivation is concise and the 
results, which are rather different from Gulliksen’s, may be expressed entirely 
in matrix operations which are easy to follow during computations. The 
solution can be obtained either directly, or iteratively, without excessive 
labor. Critical steps in the computations involve only square symmetric 
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matrices of order one less than the number of categories. A very rapid approxi- 
mate solution is also possible. 
Derivation of the Least Squares Solution 


Data obtained by the method of successive categories may be exhibited 
in a frequency table of the form 














Objects 
j p= 1,2) °>* 5a 
(1) Categories kK | me; | Me. Pw 1 fe 
“a n.i|n.. 0<m; < DP. 


By Pearson’s formula (see Guilford [7], p. 237), normal deviates, z,; , for 
the centroidal points of the intervals spanned by the categories in each 
distribution in the columns of (1) may be determined. Then, according to 
the law of categorical judgment, the population values £,; corresponding 
to the deviates z,; will be related to the true scale values for the categories 


§, by the relation 
OjSn; = be — My 
where g; is the discriminal dispersion and y; the mean affective value for the 


jth object. In the sample the relation between the corresponding estimates 
of these quantities would be 


(2) 824; Ue — M; . 


Following Horst [10], the problem is to determine values for s;, 2% , 
and m; for which the congruence of the left and right members of (2) is 
maximal. A measure of congruence is given by the correlation R between 
the right and left members of (2) based on the within column sums of squares 
and cross products. Then 


[du > 8;(@; — 2.;)(te — mj)m;]’ 





3 R? = ; 3 2 2 ’ 
ize pD 8; (@%j — 2.;) milo X (%_ — mj) mj] 
where 

(4) wh (2 eusmes)/N.5 : 


Since the maximum of R? is independent of the origin and unit of the 
assigned and derived scale-values, impose the conditions that 


(5) a x (x, — m;)n; = 0, 








234 PSYCHOMETRIKA 
(6) 3 > 8; (ej — 2.;)"M; = 2 es (1, — m;)’m; = ¢, 


where c is any finite constant. 
Introducing the undetermined multipliers \, x, and differentiating the 


expression 
ime D 8i(e; — 2.;)(te — mj)mj]!? — 2 DS rem. — a m,n. ;) 
on «lu > 85 (ei — 2.3)" + zi > (x, — m;)’n,; — 2c] 


with respect to m; , x, , and s; , and then equating to zero yield, respectively, 
the stationary equations (8), (9), and (10). 


[Xu > Si(Zen — 2.n)(Te — My) Man] > 8;(Z4; — 2. ;)Mej 


(7) 


(8) 
—n;—- kK z (1, — m;)m; = 0, 
[> :; 8;(Z,; — 2.:;)(%, — m;)n,;] dD silens — 2.;)Mz — Ane. 
(9) i g i 
—« zy (1, — m;)m; = 0, 
[ z, Si(Zen — 2.n)(Te — ™M)Men] 2, (Ze; — 2.;)(Xe — Mi); 
(10) h k k 


=e as 8;(e3 — Z.i)" Mei = 0. 
k 


Summing (8) on j or (9) on k gives \ = 0. Multiplying (10) by s; and summing 
on j gives x = ck’. 
Substituting for \ and x in (8) gives 


(11) Zz (1%, — m;)mu; = :; 8)(Z; — 2.;)m;/R. 

k k 
Since z,, = (>-. %;m%;)/n., the right member of (11) vanishes, and 
(12) m;, = (dX aynej)/N.; + 


Substituting for \ and x in (9) gives 
- (x, — mj)r; = Do sie; — Z.;)mei/R, 


1 


and, substituting from (12), 


1 
(13) Lyne. — > > .. @» LpNyi)Mej = 2 8;(2x; a z.;)m;/R. 


Substituting for x in (10) gives 
8; a3 (2; — 2.;)"m; = zz (fi; — 2.;)(t%, — m;)m;/R 
k k 


= a (4; — 2.;)txme;/R — m; z (@.; — 2.;)m;/R. 
k k 
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But 1 (Zn; ae 2.;)Ma; = 0, hence 
(14) 8; > (ti; — Z.;) Nes = > (ee; — 2.;)axm;/R. 


To express explicitly the solutions for the systems of equations (12), 
(13), and (14), it is convenient to adopt matrix notation and to define the 


r X q matrix T = [ii — 2.1) Mes], 

r X q matrix F = [n,;], 

r X rdiagonal matrix D, = [n.], 

q X q diagonal matrix D; = [n.,], 

q X q diagonal matrix Dy = [Dox (Zui — 2.5)? Mai), 


1 X r row vector x = [z,], 
1 X qrow vector s = [s,], 
1 X qrow vector m = [mj]. 


Then (12) may be written as the matrix equation 
(15) m = xFD;", 
and (13) may be written 
a(D, — FD;'F’) = sT’/R, 


or 
(16) 2B = sT’/R, 
and (14) may be written 

(17) sD,. = xT/R. 


(In these equations RF is a scalar.) 
Equations (16) and (17) must be solved simultaneously. Substituting 
for s in (16), 


(18) a(7'D;'T’ — R’B) = 0. 

Since the rows and columns of both 7D;,' T’ and B sum to zero, the 
determinant of (7'D;. T’ — R?B) is identically zero for all values of R’ and 
(18) cannot be solved as it stands. However, arbitrarily assign a zero value 
to one of the elements of x, say x, , and write (18) as 

x*(('D;'T’)* — R’B*] = 0, 
where the asterisks indicate that the row and columns corresponding to 7, 
have been dropped. Compute the inverse of B* and write 


(19) x*{(TD;!T’)*B*"' — RI] = 0. 
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Thus, x* is the latent vector associated with the largest root of the determi- 
nantal equation 


| (TD;'T’)*B*"" — R7I| = 0. 
This root and vector may be evaluated readily by Hotelling’s method [11]. 
It is easy to show that 


y= 4% + 2%, 


where «* = 0. Hence, the subsidiary condition (5) may be used to find x 
by determining x, from ’ 


(20) r, = (= rim.) /n, . 


A second iterative method, which converges somewhat more slowly but 
requires fewer preliminary computations, can be obtained by bringing 
(16) and (17) into the form 


(21) Ra* = sT*’B*" 


and 
Rs = «TD; . 


A trial vector, s™, e.g., (1, 1, --- , 1), may be substituted in (21) to yield 
Rz*™. The subsidiary condition (5) may be applied to obtain Rx, which 
may be substituted in (22) to obtain s™, and the cycle repeated until suc- 
cessive values of the vectors x or s are essentially stationary. 

The scales of the vectors resulting from either of these methods are 
arbitrary, but may be adjusted by condition (6). Thus, if s, and x, are the 
arbitrary vectors, they may be brought to scale by 


8 = 8, Ve/(s,D,8.) ; 
t,Ve/(1,Bx.) ; 


where c is any convenient constant, e.g., (n.. — q). 
If the second iterative method is used, R® may be obtained from 
aT D T's’ 
2rBa’ 
In the matrix B = (D, — FD;'F’), the term D, may be dominant. If 


this is the case a rapid and reasonably accurate solution may be obtained 
by substituting D;' for B*~* in the second iterative method. 


x 


R? = 


Numerical Example 


The preferences of 245 enlisted personnel of the United States Army 
were obtained for a set of menu items by means of a nine category hedonic 
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scale [13]. The frequency table of ratings for twelve of these foods is shown 
in Table 1. The body of Table 1 makes up the matrix F, the column totals 


TABLE 1 


Preferences for Twelve Foods Obtained by the Method of Successive Categories 








Categories Foods Total 
1 2 3 4 5 6 7 8 9 10 11 12 





9 4 6 431 4 7 2 32 3s £6 1 , 12 139 
8 13 30 77 11 56 38 85 25 70 15 46 46 512 
7 32 72 675). 3S CS Oa OKCTOCSS COTE 92 778 
6 37 62 40 32 60 76 33 74 41 40 42 61 598 
5 47 27 16 47 27 28 20 36 24 35 2 2 354 
4 29 =. 26 7 45 17 9 rs 23 14 48 15 9 249 
3 26 10 2 31 6 7 2 ll 3 33 10 8 149 
2 31 6 1 30 5 2 1 9 0 30 16 1 132 
1 24 15 4 17 4 0 0 8 2 19 21 1 115 
Total 243 254 253 250 254 253 254 253 252 253 253 254 3026 





the matrix D; , and the row totals the matrix D, . Frequencies in Table 1 
were divided by their column totals and cumulated upwards. Normal deviates 
to the interval marks were estimated by the formula 


ei = (Ya-1y5 — Yai)/(m;/n.;), 


where y,; and yq-1); are ordinates of the normal curve for the cumulative 
frequencies of the k and (k — 1)th category, for the jth menu item, the extreme 
ordinates being taken as zero. 

From the original frequencies and the normal deviates the matrices 
B, T, and D,, were formed. Nearly stationary values for x and s were computed 
by the second iterative method and brought to scale by (6). Mean affective 
values for the foods were computed by (15). The resulting values, together 
with approximations obtained by setting B = D, , are shown in Table 2. 
Also shown are the corresponding estimates computed by an iterative version 
of Gulliksen’s method due to Diederich, Messick, and Tucker [4]. The weights 
used to make the latter method appropriate for incomplete data were zero 
for the three unused cells (Table 1) and unity elsewhere. All the resulting 
vectors have been brought to the same location and scale to facilitate their 
comparison. 

The regression line for food 3 has been plotted from the values of m; 
and s,; from Table 2 for the exact solution. The corresponding regression line 
for Gulliksen’s solution is shown for comparison. Normal deviates to the 
interval marks and boundaries are also plotted (Fig. 1). 

Food 3 was chosen because the preference for it scaled poorly. Note 
that the regression line for Gulliksen’s solution tips away from the deviates 
which represent the upper, much used, categories and toward the deviates 
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TABLE 2 


Comparison of Successive Intervals Solutions 








Gulliksen's Alternative Solution 
Solution Exact Approximate 





Interval Scale Values 


Categories 
9 2.03 2.537 
8 ye 1.320 1.326 
7 “105 .475 2432 
6 -.480 -.203 -.281 
5 -.927 -.688 -.682 
S -1.331 -1.093 -.985 
3 -1.651 -1.459 -1,282 
2 z -1.856 -1.634 
1 aie -2.538 -2.660 
Discriminal Dispersions 
Foods 
1 1,152 1.103 1.080 
2 1,067 1,081 1.079 
3 1.122 1.058 1.110 
“4 1,074 1.021 -984 
5 -970 -992 -992 
6 -748 .777 768 
7 -950 -989 1.033 
8 -941 -956 -933 
9 1.045 1,057 1.094 
10 1,002 1.017 .978 
11 1,253 1.266 1.275 
12 -881 907 918 
Mean Affective Values 
Foods 
1 -.756 ~.765 -.731 
2 -.157 -.139 -.149 
3 .707 668 669 
4 -.722 -.770 -.721 
5 -161 -186 176 
6 127 2157 131 
7 -781 2750 755 
8 -.200 -.188 -.196 
9 602 565 568 
10 -.776 -.788 -.743 
ll -.142 ~.089 ~.089 
12 341 .307 2292 





~>~ De eo 4aeasa4is°"s3 


of the lower, little used, categories. The alternative solution necessarily fits 
closely the deviates for the much used upper categories and tips away some- 
what from the two lowest categories. 


Discussion 


The successive intervals solution proposed in this note is¥ currently 
being used in connection with models for predicting consumer choice and 
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purchase (Bock [3] and Jones [12]). Because predictions based on these 
models are sensitive to small differences in the estimated mean affective 
values of consumer objects, it is important that the method of scaling the 
preferences yield efficient estimates of these values. The proposed solution, 
which leads to the conventional mean once the scale values for the categories 
are chosen, has this property. The models also require estimates of the 
correlation of preferences between objects, and it is convenient to have a 
solution in terms of interval mid-points which can be applied to the compu- 
tation of the correlations by the usual methods for grouped data. 


Gulliksen's 
solution 

m3= wor 

Sz 21ll22 \ 


‘ 


Alternative 
solution 
m3= -668 
$3 1.058 


” 
W 
=) 
J 
=< 
> 
Ww 
J 
<q 
oO 
w 
> 
e 
°o 
© 
Ww 
a 
< 
oO 


7 
Interval Boundaries (t,) 


~ interval Marks (x,) 
Tie =f e) ! 2 
NORMAL DEVIATES TO INTERVAL MARKS AND BOUNDARIES 





FiGureE 1 


Comparison of Regression Lines for Food 3 
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Since the successive intervals solutions assume normally distributed 
affective values, they are preferable for applications in which the normality 
assumption is crucial, e.g., the prediction of choice models. In studies where 
the detection of differences among the objects is of primary importance, 
on the other hand, it may be desirable to choose scale values for the categories 
which maximally discriminate among objects. A scaling solution which has 
this property has been proposed by Fisher [6] and Guttman [9]. When applied 
to successive categories data (Bock [2]), it determines scale values for the 
categories which minimize the error term in a one-way analysis of variance; 
hence, the method might appropriately be called “least error” scaling. Interest- 
ing statistical tests which may be used in connection with least error scales 
have been described by Bartlett [1]. 
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COMMUNALITY OF A VARIABLE: 
FORMULATION BY CLUSTER ANALYSIS 


Rosert C. Tryron* 
UNIVERSITY OF CALIFORNIA 


The communality of a variable represents the degree of its generality 
across n — 1 behaviors. Domain-sampling principles provide a fundamental 
conception and definition of the communality. This definition may be alter- 
natively stated in eight different ways. Three definitions lead to precise for- 
mulas that determine the true value of the communality: (7) from the k 
necessary and sufficient dimensions derived by iterated factoring, (ii) from 
the n — 1 remaining variable-domains, and (izi) from k’ multiple clusters of 
the n variables. Seven definitions provide approximation formulas: (7) one 
from the k dimensions as initially factored, (iz) one from the n — 1 remaining 
variables, and (777) five from a single cluster. Rank of the matrix is not a 
desiratum in some definitions. Using an example designed by Guilford to 
illustrate multiple-factor analysis, applications of the formulas based on the 
three precise definitions recover the true communalities, and five approxima- 
tion formulas each gives values closer than the ad hoc estimates usually 
employed in factor analysis. 


Some characteristics of individuals appear to have greater generality 
than others. By generality is meant the degree to which variation in a given 
behavior is also revealed in other behaviors—the degree to which it may be 
predicted from the others. One objective measure of its generality is its 
communality h?. The communality is an important companion statistic to 
the reliability coefficient of the variable. Just as the reliability coefficient 
gives the degree of generality of a particular measurement across strictly 
comparable test-samples of the same behavior property, the communality 
of the measure gives its generality across different behavior properties. 

The rise of factor analysis has introduced narrowness in the under- 
standing and computation of both reliability and communality [10]. The 
first statistic is discussed in [11]; here communality is defined in terms of the 
operations an investigator actually goes through in taking n measures of 
objects by sampling methods. Following from such a definition one may 
derive formulas for the exact value of h’. The simple algebra of communality 
follows directly from the doctrines of behavior domain-sampling. 

As an illustration of the formulations presented here, Table 1 presents 
results of an application to an artificial problem designed by Guilford for his 
presentation of factor analysis ({2], p. 478ff.). In this 10-variable problem 
the true communality of each variable is given in the first row of Table 1. 


*The writer wishes to express his indebtedness to C. F. Wrigley and H. Kaiser for 
their many helpful constructive criticisms. 
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On domain-sampling principles, the communality of a variable may be com- 
puted according to the way the investigator designs his analysis. Thus in 
Table 1 the value may be computed (2) by factoring, (7) directly from the 
n — 1 variables, (777) from a single cluster of variables, (7v) or from multiple 
clusters. Some formulations give true values (rows 2, 6, 12); others are 
approximations (rows 3, 4, 5, 7, 8, 9, 10, 11, 13). The closeness of fit is given 
by the mean absolute deviation of computed from true values, last column. 
For comparative purposes, Table 1 includes results obtained by factor 
analysis (see row 4, and the ad hoc estimates, rows 14 and 15). 


Fundamental Definition of Communality 

In cluster analysis, computing the value of the communality h; of any 
fallible variable a, among n intercorrelated fallible variables a, b, --- , n, 
requires one of various computing formulas that follows from alternative 
meanings or definitions of the communality. For the given set of n variables 
there is, of course, only one correct value of h? in the population. Consider 
here the fundamental definition of communality. 
Definition 1: h’, the correlation with a congruent parallel variable 

Under the principles of domain-sampling, one definition of h?2 is 
(1) he Tan) = Tent oe tt Taae ’ 
where a’ is a parallel construct variable from an infinite set of parallel 
variables, a, a’, a’, --- , aw , all of which would have perfectly congruent 


correlation profiles over the observed n — 1 variables. Specifically, perfect 
congruence means 


(2) Tat = org = Lares = °°° = Tank » (@@ = b,c, -:- »@). 
Definition 2: h’, the predictable variance from the variable-domain 


The observed variable a may be considered as one sample measure from 
a composite score C, on a domain of such parallel variables, namely, 


(3) C,=z.t+2 +:°:-+2.= doz. 


C’, may be called the cluster domain score of variable a. By the general formula 
for the correlation of sums and using (2), the correlation of observed variable 


a with its domain score C, is 

Toca = [L + (a = Mtea']/V te + a(Me — Iea’ « 
Dividing numerator and denominator by n. , and using (1), 
(4) Toca = ha - 


The magnitude h, is termed the cluster domain validity of variable a. It 
reveals how well the observed variation among objects in variable a matches 
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that in a perfect cluster domain measure of C, . For a geometric model of 
these relations, see [12]. 

The variance of the fallible scores predicted from the domain scores is 
the square of the correlation between them, i.e., from (4) 

(5) he _ 3Ce ’ 
which is the second definition of the communality. 

The two definitions of the communality in (1) and (5) have certain 
restrictions: 

(t) The parallel construct variables a, a’, --- , a. must, by operational 
definition, measure different behavior domains, as stated in the first paragraph 
of this article. Variables that are sample measures of the same behavior 
domain are the elements essential to calculating the reliability coefficient [11]. 

(#7) Equality of correlations of the parallel variables, as given in (2), 
is stated above as a restriction on both definitions. Such an equality is, 
however, an unnecessary restriction on the second formulation for, as will 
be shown later, (5) still holds under the more general restriction that the 
parallel variables that compose C, need only reveal proportional correlations 
(26). 

Note that (1) and (5) provide two meaningful definitions of the com- 
munality quite independent of the number of dimensions (or “factors,’’ or 
the ‘“‘rank”’) necessary and sufficient to reproduce the n — 1 correlations of 
variable a. 


Communality from k Independent Dimensions 
Definition 3: h’, the sum of partial communalities 


In cumulative communality cluster analysis, termed CCC analysis 
[12], as in centroid factor analysis, the number of independent cluster domain 
scores C, , C, , --- , C, required to reproduce the correlation matrix (reduce 
the residual correlations to zero) is determined by a factoring procedure. 
‘The k independent cluster domains are linear composites of the n variables. 
The result is that the domain score C, of variable a is perfectly predicted 
from scores on the independent cluster dimensions, whence 


(6) Tac. — WE caitepn itty = D>, Boos oc: ) (i — 1, 2, — , k). 


Since the predictors are uncorrelated, then from (5) 


(7) he = ree, tricg toes +8een 
where the successive r?, terms are called the partial communalities (squared 
coordinates), hi, , hz, , --* , hi, , respectively. These are the predictable 


variances of the objects’ observed scores on a from the k independent di- 
mensions. Factorists call these variances the squared “factor loadings.” 
Here is a third definition of h’, based on the factoring procedure. 


’ 
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Definition 4: h’, the squared multiple R with the independent dimensions 


The factoring process perforce yields the uncorrelated dimensions 
C,,C,, +++, C,.In (6) for such independent predictors, 


Bac, = Tacs » 
whence 
(8) he = Bic, + Bic. 4 ses + Bec ; 
(9) hz = Resc,04++se5 . 


Thus, definition 4, namely, the communality is the squared multiple R 
between variable a and the k independent dimensions necessary and sufficient 
to yield vanishing residual correlations. 


Communality from the n — 1 Variables 
Definition 5: h, the squared multiple R with the n — 1 remaining variable- 
domains 


Each of the independent cluster domains C, , C, , --- , C, is a mere 
linear composite of variable-domain scores ({12], formula 8), 


(10) C= t,t. + --- + C,, . 


By excluding any one of the n variables, the k dimensions are not reduced 
to k — 1. For example, C, cannot alone be responsible for any one dimension; 
it must share with at least one other variable (and usually more) some 
predictable variance from the dimension, otherwise the dimension would 


not be required. Therefore, C, , C. , --- , C, may replace C, , C,, --: , C, 
in (9), whence 
(11) he = Ri -cscer++Cn , 


Definition 5 implies that the communality of a variable represents the 
degree of its generality across the n — 1 other variables, being its variance 
predictable from the remaining variable-domains. 


Evaluation of R?.¢,¢.+.+c. 

Evaluation of h? in (11) will provide, along with (7), a second exact 
computing formula for the communality, in this case without factoring. 
Writing the predicted @ of (11) in the form of a regression equation of a on 
the C’s: 

(12) a= Bac Zc; + 
Here and below, let 7, 7 = b,c, --- , nandi < j. 
Then, by the formula for multiple R in terms of 6’s, 


(13) h, = Be stitgeries, iad } a Bac %ac,/Fa . 
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In the numerator of (13) the r term equals the observed r,; augmented on 
t, 1.€., 


(14) ‘oc; = ai/h; e 


To develop the denominator term of (13), o,, note that the z in (12) is the 
standard score of C; . Noting that C; is a sum of parallel variables, 7, 7’, 
+++, 2. , a8 in (3), then (12) may be rewritten as 


(15) G@ = D> (Bie: /oc i + eer + ove + eine)- 


Since the z scores here are those of the parallel variables symmetrical with 
(3), they have defined statistically parallel properties symmetrical with 
(1) and (2). 

The square root of the variance of (15) is needed in (13). This expression 
is the sum of all the elements of a covariance matrix which has as rows and 
columns the n.. terms in 2, , the n., terms in z, , and so on. This grand matrix 
is composed of submatrices on the leading diagonal involving the r;,;- sets 
such as r,,° , and the side submatrices involving the r;; sets such as r,, . In 
the development below recall that r;;, = hi , by definition, as in (1), hence 


(16) 00; = No + Na(No — 1h; . 


In the 7th diagonal submatrix the sum of zfs own principal diagonal 
becomes, in the limit (as the result of dividing numerator and denominator 
by n.), 


(17) NxBacs/Fc, = Bac;/[1 + (ne — Ihi] = 0. 
In this submatrix, the sum of its side elements becomes in the limit 
(18) NN» — 1)Becshi/oc, = Bac: - 


In the 7jth side submatrix, of which there are two by symmetry in the 
grand covariance matrix, the sum of their elements is, in the limit, 


(19) 2n2(Bac,/oc,)(Bac;/oe;)"ii = 2(Bac:/hi)(Bac;/hy)rii . 
Summing the terms (17), (18), and (19) over the grand matrix, 
(20) a7 = & Bec, + 2 a (Bac, /hi)(Bac;/hari; ° 


Finally, substituting (14) in the numerator of (13), and the square root of 
(20) in its denominator, gives by squaring the whole the communality, 


(21) h=R? on Keine ea diod dtl 
a a*CoCer**Cn >. To, 4. 2 % (Bac,/h;)(Bac,/hirsi ’ 


,n) and (i < j). 





(i, 7 = b,¢, «+: 
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Evaluating the expression on the extreme right of (21) would give an 
exact solution to h?. There are n — 1 unknowns h; , hz, «++ , h, in this equa- 
tion. The 6’s involve only terms in r,; (which are known) and in the n — 1 
values of the unknown h’s. Since the h’s are in first and second powers, the 
equation is quadratic. Recalling that there are n equations (21) to be solved 
simultaneously, formula (21) may be called the simultaneous quadratic formula 
for the communality. The positive roots of the n quadratics are the desired 


h values. 


Iterative solution of R?.¢,¢,.++c. 

By electronic computer the n simultaneous quadratics may be solved by 
inserting initial trial values of the h’s, solving the quadratics for new h values, 
and continuing the iterative process to convergence. Speed of convergence 
depends on the choice of trial values. The writer recommends that trial 
values be those known on domain-sampling principles to be close to the 
correct values. As will be shown later, approximation B provides such close 
values. Because this solution is a critical one in this paper, the steps involved 
in an electronic program are listed at this point: 


(7) Selection of a reference cluster of congruent variables for variable a: 

(a) Calculate the degree of congruence of the correlation profile of 
variable a with the profiles of each of these variables. An objective measure 
of congruence with such a variable, call it v, is the index of proportionality 
of the n — 2 corresponding r,; and r,; values (see [12], formula 6). 

(b) Select as two reference variables of a the two with the highest indices, 
and compute h? by approximation B, formula (29). 

The ten trial values for the Guilford problem by this approximation are 
listed in Table 1, row 5. 


(ii) Iteration for h? by the simultaneous quadratic formula (21): 

(a) One can now set up a correlation matrix with trial values of all the 
coefficients necessary to solve (21). One row is for variable a and its correla- 
tions with the domains C, , --- , C, . In this row, rac, = Tai /h; . The remaining 
n — 1 rows and columns include the trial augmented correlations between 
the n — 1 predictor variable-domains, i.e., r;;/h;h; . 

(b) Compute (21) for each of the n variables from the matrix described 
in (a) just above, thus securing the first iterated value of the communalities. 

(c) With these new values of h for all the variables, set up again new 
augmented matrices as described in (a) above. 

(d, etc.) Recompute the h values by (21) and continue the process to 
convergence. 

I am indebted to Dr. Henry Kaiser for programing these procedures 
for the Guilford problem, using electronic computer IBM Type 650. The 
results are given in Table 1, rows 6a to 6e. Each iteration required 5 rinutes. 
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Note the speed of convergence fo the true values. By the 10th iteration, the 
mean deviation from true values is 3 one hundred millionths. 

Dr. Henry Kaiser has tried upper and lower bounds of h’ as trial values 
in (21) in a number of artificial and empirical matrices. His results appear 
promising but convergence by these approaches seems at the present writing 
to be erratic [5]. 


Approximation A: h’ from the squared multiple R with the n — 1 variables 


Suppose the impossible: that one had before him a matrix of correlations 
among an indefinitely large number of variables. From (9) the scores of the 
objects on the independent cluster domains C, , C, , --- , C, would be mere 
linear composites of the n. variables. But since h2 is the squared multiple 
R between variable a and the beta-weighted scores on the C’s, it would equal 
the squared multiple 2 between a and beta-weighted scores on the n., variables 
that compose them, that is, 


(22) ; = he. 


atbc***na 


In practice one deals with a finite n. If one now conceptualizes the 
finite variables as a representative sampling of n variables drawn from an 
infinite domain of n., variables having the same general pattern of correlations 
with a and with each other as do the actual n — 1 variables, then, on the 
theorem that the multiple FR increases as one adds similar kinds of predictors, 


(23) eee < a = h2 . 


The squared multiple R between variable a and the n — 1 other variables 
is thus a lower limit of the communality. Though this relation is already 
known [3], its simple logic and proof on domain-sampling principles is of 
interest. If the betas are significant on a reasonable number of the n variables, 
then it seems likely that the value of the squared multiple R approaches the 
value of the communality. The lower limits of the communalities by (23) 
for the ten variables of our illustration appear in Table 1, row 7. 


Communality from a Single Cluster of Variables 


Above, h? is expressed as a function of all n variables, hereafter as a 
function of a cluster grouping of the n variables. First, develop the correlation 
between variable a and a domain score on a cluster of variables that includes 
a. This domain score is defined as 


(24) C.= Dazt Dat: + Lz 
(24a) =(.+Ga+°:°4+¢C., 


where s is the number of variables in the cluster, and 2, = CU, =z, + 
Zar + +++ +2,, a8 defined in (3), thea, a’, --- , a,,, parallel variables perfectly 
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congruent as defined in (2), having equal communalities, r,,, = h? as defined 
in (1), and analogously for the other summation terms of (24). 

The correlation between variable a, the first term within }>z, , and 
the composite of (24) becomes by the correlation of sums in the limit 


r — Gor + ts + Salas +t) (hz + ry + ia + 1,,)° 
(25) ie ~ tive +2 z= ii Zz h; +2 = ii ‘ 


@,j=@---8;t <9). 





Here is a basic formula of cluster analysis and, for that matter, of factor 
analysis. When s = s, , it is the partial communality, h?, in the Key Cluster 
Method of CCC analysis ({12], formulas 9 and 13). When s = 2, it is the 
squared centroid factor loading in Thurstone factor analysis. As seen below 
it provides the exact value of the communality of a variable under certain 
specified conditions that are, however, difficult to meet in practice. 


Definition 6: h’, the correlation with a cluster domain of congruent (collinear) 
variables 


Take the case in which the variables that compose C, have congruent 
profiles of their columns of r in the correlation matrix, though not necessarily 
at the same level of correlation. (The term ‘“‘congruent”’ has a geometrically 
equivalent expression, ‘‘collinear,” referring to the fact that such variables 
lie on the same straight line from the origin, i.e., on the same vector in n or 
k space.) The precise definition of congruence (or collinearity) is 


(26) r.:/To; = a constant; r,;/r.; = a constant; --- ; 
1 s-1)i/Tas = & Constant; (¢ = the remaining n — 2 variables). 


Congruent variables are also called ‘‘equiproportional,’’ and their submatrix 
of intercorrelations is called ‘‘hierarchical,’”’ or having a “rank” of one. 

For the case of two variables, a and b, introducing their correlations 
with their respective parallel variables into (26), noting (1) and (2), gives 


Toa’ /Tra’ aa Tar /Tooe = hi/ros = Tar/hi ’ 
or 
(27) ras = hh, , that is, in general, r,; = h,h; . 


Now write the special case of rac, by (25) for s congruent variables, i.e., 
substitute (27) in (25), 


2 he + h(hs + h. -++ +h)? head ; , 
ty, = eS Saat ) ? (t,j = a, +++, 834 <j). 


Multiplying out the numerator, then taking out h? , the resulting parenthesis 
term will reduce to (>-h,)’, which in turn after expansion becomes the equiv- 
alent of the denominator. In short, 
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ro, = b> hd?/( D7 + 2 DAs) = t?, 
@,j eet ae » 832 < }). 


(28) 


Here, then, is definition 6 of the communality of variable a, namely, its 
squared correlation with a domain score of s congruent variables. 
To evaluate (28), one uses (27) to obtain 


= hr Dd hih;/ > hh; = DS (hahd(hah;)/ D> heh; 
m= Do Teter! 2 Tes » (i,j =b, +++ ,s;5t <j). 


This formula was used by Spearman to measure his g saturation of variable 
a ([6], appendix, formula 20). On principles of domain-sampling, Spearman’s 
g was thus simply the composite domain defined by (24) composed of con- 
gruent variables. But (29) is awkward to use in an actual matrix; hence its 
equivalent, also developed by Spearman ((6], appendix, formula 21), is used: 


he = [(Lotas)? — Dra = 6, +++ ,8)/[ Dorit, j =a, +++ 8309) 
ae LD tasli = b, --- , §)]. 


Approximation B: h’ from squared r with an approximately congruent cluster 
domain 


2 
(29) Ta Cy 


(30) 


In practice one rarely finds strictly congruent variables. But one can 
always group the n variables systematically into as many approximately 
congruent clusters as possible (see above in evaluating formula (21), also 
[9], appendix B, and [12]). The approximate communality of each variable 
can then be computed by (30), or (29). In the Guilford problem, Table 1, 
row 8, the fit to the true values is shown by an average absolute deviation 
of .021. 


Approximation C: h? from converged squared r with an approximately congruent 
cluster domain 


In this approximation the general formula (25) is used. If values for h? 
were at the right, there would be a solution. Note that 7f the correct values 
of the h’ terms were known and ¢f C; were the variable-domain, C, , then the 
general formula (25) would read 


Chet ra tests tte)? 7. . } 
(31) h? = ric. = rec; = TR +2 Yr, (i,j =a, +++ 834 <j. 





To get an approximate solution put frial values for the h” terms on the right— 
values that are known on domain-sampling grounds to be close to the correct 
values. Then iterate on the h’ values until convergence, that is, until the h? 
on the right of (31) equals the hz on the left. The solution will not in general 
give the correct values because rc, * Tac, but will approximate the correct 
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values according as the variables that compose C; approach congruences, a 
condition which, if met, would result in the identities of (31). 

The first step, then, is to get good trial values for the h’ terms on the 
right of (31). To do so define the cluster domain score of C; only slightly 
differently from that defined in (24)—let it be a composite score of an in- 
definitely large sample of variables representative of a, b, --- , s, that is, 


(32) Cr =Zatete t2e4,¢ °° +ea..- 
Representativeness of the fully extended set of n.. variables means that they 
have the same average correlation properties as the observed set of s variables, 
(33) 7; for the full set b, --- , n» = 7; for the observed set b, --- , s, 
and similarly for 7; , 7; , °** , 7s 3 also 
(33a) 7,; for the full set a, --- ,n. = 7,; for the observed set a, --- ,s, (¢ < J). 
The virtual identity of definitions of (32) and (24) should result in r.¢, = 
rec, - From the correlation of sums 

1+ (no. — 1)ia: (¢ = b, «++ , Na) 
Via + Nolte — Fj (1,7 = 4, +++ No st <j). 
Taking the limit, i.e., dividing numerator and denominator by n. , sub- 
stituting (33) and (33a), and squaring, 
(34) he = rect, = [Foi = b, aia » §))/F; ’ (i, j = ay? » 83% < ))- 
There are no unknowns here. The approximation of h? from (34) is the 
simplest to compute, for in the submatrix of intercorrelations between the 


variables a, --- , s, it requires only the mean r in the row of variable a and 
the mean r over the whole submatrix. Call approximation (34): 





ha = Tact, aie 





Approximation D, : h? from squared r with an approximately congruent repre- 
sentative constant cluster domain 


Note that in the Guilford problem, Table 1, row 10a, approximation 


D, is not as accurate as Approximation B, its absolute mean deviation 
from the true values being .032, compared with .021. 


Approximation D, : h’ from squared r with an approximately congruent repre- 
sentative shifting cluster domain 


An approximation that gives results closer to true values than D, is 
secured by a slight change in definition of the domain C,, from that given 
in (32). In this revised definition C,, excludes the variable whose h’ is being 
computed, e.g., 

for hh: Cy, = 2 tees HZ tees + mre} 


for hy : Crp = 2. + 2. + 08s betes +H, | 
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and so on. Thus, for the calculation of the h’ of each variable in the cluster, 
the cluster domain “shifts” by virtue of excluding the given variable from 
it. Definition (33) remains unaltered in this case, but in definition (33a) 
both the full and the observed sets begin with b (a being excluded). In the 
limit, then, approximation D, takes the same general form as (34), that is 


h? = ric,;, = same as (34), except that in the denominator 
(34a) 4,3, = b, -°+ , 8; <j. 


In Table 1, row 10b, note that the communalities by D, show a mean 
absolute deviation of .023, nearly as good as those by B, and better than those 
by D, . 


Iteration for approximation C 


Inserting the values by D, or D, as the initial trial values in (31), one 
can iterate to convergence. The successive values form a geometric progression, 
so one can take the limit of such a progression as the final converged value. 
With the values of D, , as trial values, the converged limits, given in Table 
1, row 9, are seen to be equivalent to those by approximation B. 


Approximation E: h from a simple quadratic squared r with an approximately 
congruent representative cluster domain 


The cluster domain C, can be defined in still another slightly different 
way—as a composite of variables representative of the observed set a, b, 
--+ , s, but with the observed set deleted, as follows: 


(35) Cree = Be ye os FB Fes HH, . 


The variables a’, b’, --- , s’ are, respectively, defined as parallel to the observed 
a, b, --+ , sas stated in (1) and (2), and the n.. extended set have the average 
statistical correlation properties as defined in (33) and (33a). Once again the 
virtual identity of the domain in (35) and that in (24) should yield r.c¢,,- = 
Tac, - From the correlation of sums, taking the limit, and substituting (1) 
and the equivalents of (33) and (33a), 


ie (1/s)h2 + [(s — 1)/s]f,; (i = b, «++ ,8) 
Vii TL eee Se 





(36) Tact,’ 


There are only two unknowns here, that on the left, and h? on the right. 
Recall that the domain C,, , is an approximation to C, , and it in turn is an 
approximation to C, . Remember that C, would be C, if the s variables were 
congruent. That is, h, = rac, - 

Thus set h, on the left of (36). The result is a simple quadratic in h. 
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Solving and squaring 
(37) he = ([V As & VAs — 4U(s — 1)/s"FFas]/(2/s)}?. 
(In7,;,2 = b,--: ,8. Inf,; ,t,j = @, °° ,83t < 4.) 





h? is the square of the positive root between zero and unity. 

Approximation E to the communality in (37) should be about as good on 
domain sampling principles as B and C. In the Guilford problem, Table 1, 
row 11, the mean absolute deviation from the true values is about the same 
as for B and C, namely, .024 as against .021. 


Ad hoc estimates of the communality 


None of the above approximations are ad hoc estimates. All are formulated 
squared correlations between variable a and a defined domain score of the 
objects—domain scores that are not exactly equivalent but nevertheless 
close to the domain score C, required in the basic definition of communality 
in (1), (2), and (3). Sheer ad hoc guesses are likely to be far off the mark. 
In this category fall the use of highest r,,; or Burt’s modified highest r as 
recommended by centroid factorists ([1], p. 153 ff.). Note that in the Guilford 
problem, Table 1, rows 15 and 14, these two ad hoc estimates have absolute 
mean deviations from the true value of .083 and .067, respectively, about 
four times and three times as poor as approximations from B, C, and D. 

If an analyst wishes a quick ad hoc estimate he can do better than the 
above estimates by using one based on the following rationale: On the grounds 
that approximation D, by ri¢,, and approximation E by ric,,, should be 
roughly equivalent, let us therefore set the h, by (34) equal to that by (36) 
and solve for the one unknown, h? . A little algebra will then show that 


(38) he = Tai ° 


To calculate this estimate one need only to discover, say, two or three reference 
variables with which variable a is most congruent, and compute its mean 
correlation with these reference variables. For the Guilford problem, Table 
1, row 13, this estimate has a mean absolute deviation of .050, much better 
than the other ad hoc estimates, but poorer than approximations B, C, 
D, and E. 


Communality From Successive Residual Clusters (Factoring) 


The communality of a variable can be found, as shown earlier in the 
3rd and 4th definitions, by cumulating the partial communalities from k 
independent cluster dimensions C, , C, , --- , C, secured by a factoring 
procedure. Factoring can be performed in different ways, all special cases 
of general cluster formulations. ° 
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General principles of factoring, or successive residual. cluster analysis 


The factoring process, in general, follows this sequence: A first cluster 


C, , consisting of variables a, b, --- , 8, is selected, whose domain scores are 
defined as in (24). This domain score is 
(39) C=C, = Dat Dat: + Da... 


The correlation of variable a with the domain score C, is then computed by 
the general formula for the correlation of a variable with a domain score 
given in (25). For any variable v that may not be included in C, , the general 
formula is the same as (25) excepting for the numerator, namely, 


(40) Tec, = (> 797/> h; a 2 2, Ta) (i, j =, °° » 834 < )). 


A second residual cluster is now selected composed of s, variables. Its 
domain score is ,C, (written simply as C,), defined as a composite of the 
=> 12 scores of its s, variables. The correlation of variable a and variable v 
with C, is computed by formulas identical with (25) and (40), respectively, 
with the exception that the h’ and r terms are residual communalities and 
correlations. The analyst continues to select additional successive clusters 
C;, +--+, C, until all residual correlations vanish. 


Special cases (Centroid, Key Cluster, Square Root, Principal Component, 
and other methods) 


The various factoring methods are merely different special cases of this 
general formulation [13]. Notice in the general definition of a cluster domain 
score by (24) there are two sets of parameters. The first is s, the number of 
variables selected out of the total n. A second refers to the number of parallel 
variables in each )> z term. Each >> z; may be written as 


(41) De He Hee tes He, 


For the domain score C’, , n; equals n.. within each term of (39). But it may 
be set at any finite number. Let us look at the parameters selected by the 
various factoring methods: 

In the Thurstone Centroid Method of factoring [1, 7], the analyst sets 
8, = n in (39), that is, C, is the omnibus cluster domain score consisting 
on all n variables. The formula for the squared centroid factor loading 
may be recognized as our general formulas (25) and (40) with s, = n. Anal- 
ogously, in this method one sets s, = s, = , --- , = n. For the second set of 
parameters, the centroid method sets n, = n, = , «++, = nan (41). 

In the Key Cluster Method as developed by the writer the analyst selects 
as the successive residual clusters different groups of residual variables that 
are most independent of each other. Here, s, , s., --+ , s, refer to the different 
clusters and all are less than n. Here, also, n, = m =, +++, = No. The gains 
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in efficiency, accuracy, and structural analysis of the CCC method over the 
centroid method are developed elsewhere [10, 12]. 

In the Square Root Method of factoring, each successive cluster consists 
of one residual variable (that is, s = 1) that shows maximal residual correl- 
ations, and n, = m = ,°+:, = n.. The main proponent of this method is 
Wrigley [14]. The writer also recommends this method for special situations 
({9], Appendix D). There are other factoring methods that specialize in setting 
the successive independent dimensions C, , C, , --- , C, as fallible composite 
scores. Here, n, = m = ,-°-+:, = 1. Notable is Hotelling’s Method of Principal 
Components [4], which also sets s = n as well as weights the scores of the 
variables in the first and each residual composite. In this weighted case note 
that (25) reduces to putting the magnitude 1.00 in place of the h” values. 
Methods in which unities are put in the diagonal, as well as those that define 
the residual clusters in such a fashion as to require reliability coefficients 
in the diagonals instead of h? values in the general formula, cannot be further 
elaborated here. 


Determining h’ by factoring 


The procedure of computing the communalities of the n variables by 
factoring is to set trial values of h’ in the general formula (25). After the 
factoring is complete, new trial values of h? are cumulated from the partial 
communalities by formula (7), these are plugged back into (25) for the 
second refactoring, new values obtained, and the process continued to con- 
vergence. By the Key Cluster Method [12] convergence is on the true values; 
in the Guilford problem, the results are shown in Table 1, row 2. But in the 
centroid method as commonly used, the trial values taken are so inexact, 
and the factoring procedure so arduous, that most centroid analysts rarely 
refactor to convergence. Guilford does not, for example, give converged 
values for his illustrative problem. 

A common practice in centroid factoring is to accept as final the values 
of h’ secured by (7) as the result of the initial factoring. Another approxi- 
mation, then, is: 


Approximation F: h? from initial partial communalities (or “squared factor 
loadings’’) 

The values in the Guilford problem by approximation F from centroid 
analysis are given in Table 1, row 4. Note that the values are about as good 
as those more simply calculated from single clusters (rows 8, 9, 10, 11), the 
mean absolute deviation from true values being for the centroid method 
.020, and for the single cluster methods about the same. 

For comparison, take the comparable values secured from initia! factoring 
by the Key Cluster Method in CCC analysis. These values for the Guilford 
problem, in Table 1, row 3, are almost exactly the true values, their absolute 
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mean deviation being only .003. The poor showing of the centroid method 
stems from its indiscriminate selection of all n variables for the successive 
residual clusters, as well as the practice of choosing inferior ad hoc trial 
values, as illustrated in Table 1, rows 14 and 15. 


Communality From Multiple Clusters 


Another means of computing the communality of a variable stems from 
the definition of the communality as the predictable variance from the 
variable-domains organized by oblique clusters. Communality by definition 4 
is the predictable variance from variable-domains organized through factoring 
into k independent clusters C, , C, , --- , C, (9); in definition 5, the variable- 
domains are taken separately as predictors C, , C, , --- , C,, (21). Now group 
them into oblique predictor cluster domains C, , Cy, --- , Cy. 


Definition 7: h”, the squared multiple R with k’ oblique cluster domains 


To form oblique cluster domains, C, , C, , «++ , C, are grouped into k’ 
clusters in which the variables are as congruent as possible. Let, for example, 


42) Cr=C.+OQ,4+-+:-+0,= Dat Date + Da, 
(43) Cu = Cn t+ Crate +C,,, = Diem + Dew tees + Dea» 


and so on to the c,.th domain that exhausts the variables, the parallel variables 
in each = term satisfying definitions (1), (2), and (3). SinceC,, Cy, -+* , Ce 
are linear composites of the n variable-domains, then by the definitions 4 
and 5, 


(44) h? = __ ee . 


As with (12) and (13) the regression equation and multiple R? (or h2) may be 
written 


(45) C.=@= > hwte., = 1, M1, --- kk; 


2 : Bec ec.) sein peas a 
~ Me Z Ke; +2 > m Bac :BacPc:c; : Gi I, ” ‘ . ack i). 
One can evaluate (46) by a quick convergence process as follows: 

(<) Evaluation of rac, . Compute rac, by the general cluster correlation 
formula (25), using approximation B for the communalities. Recall that for 
any variable v not in a given cluster C’; , the same formula (25) applies ex- 
cepting that h? is not in the numerator, symmetrically with formula (40). 
With r.-, and r,-, evaluated then the 6 terms of (46) may be solved. 

(it) Evaluation of rc,c,, . The remaining terms to be evaluated are 
Tc,c; - Lo be concrete, consider the case of rc,c,, , Where C; and Cy; are as 
defined in (42) and (43). Recalling definitions (1), (2), and (3), this correlation 
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becomes by the correlation of sums, 


(47) 


'C\Cy, = 


Tee 4e 0. + ++ tin 
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To evaluate this expression, the augmented correlation, like rc,c,, , is needed. 


This augmented correlation is ({12], formula 21) 


(48) 


lCaCm = Tan} Rdbe * 


To secure all the rec values of (47) would require setting up the covariance 
matrix of the variables of C; and C,,; , augmenting the r’s by the formula of 
type (48) using the h values from step (7) above and then inserting unities 
in the leading diagonal. Then the numerator of (47) is the sum of the terms 
of the side submatrix between the variables of C; and C,, , the term in the 
denominator under the first radical is the sum of the terms in the C; diagonal 
submatrix, that under the second radical the sum of terms in the C,, diagonal 


submatrix. 


This work is unnecessarily laborious. An almost exact value of re,c,, 
can be secured by defining C; and C;, as representative domains C,;, and Cj;, 


defined in (32). In this case, by the correlation of sums 


Toyeyy = T1 u/ Viti Vii - 


(49) 


In the Guilford problem where the exact value is known rc,c,, is found by 


the rigorous formula (47) to be .306; by the easy formula (49) it is .307. 


(iii) Iteration to h2 . The process is first to find the initial h” values by steps 
(¢) and (¢) above, then plug these values back into (46) on the right, re- 
compute the h’ values and continue until convergence of the h? values. The 
process is speedy, as will be seen below, where it is seen that even this work 


can be shortened. 


Definition 8: h”, the squared multiple R with k most independent oblique cluster 


domains 


To use all k’ clusters as predictors is inefficient. All that are necessary 
and sufficient are k predictor clusters—the number k being that found if one 
factored the correlation matrix. 

This fact can be demonstrated from the Guilford matrix that was con- 
structed artificially from two dimensions. With this foreknowledge, the 
writer clustered the matrix, the ten variables going into k’ = 3 clusters. 
From these he chose the k = 2 most independent, that is, whose variables 
correlated the lowest. The communalities of the variables were then computed 
by (46), following the three steps outlined above. The first round gave the 
initial communalities shown in Table 1, row 12a. They miss the true values 


on the average by only .006. But convergence to the true values is rapid; on 








258 PSYCHOMETRIKA 


the 4th iteration the values in row 12b show a mean absolute deviation less 
than .001. 

Since the analyst does not know the value of k ahead of time, a procedure 
that provides an efficient solution is needed. If for each variable, one orders 
the r’s with the k’ clusters in order of magnitude, then one can choose as 
predictor clusters only those that have significant correlations with the 
variable. Employing (47) for these predictor clusters should give close to the 
exact value of the variable’s communality. 


Summary and Conclusions 


The communality h? of a variable a refers to its generality across n — 1 
other behaviors b, c, --- , n. It is the variance of scores on a, predicted from 
the domain score C, , where C, is a construct composite of scores on a large 
number of different behaviors whose correlations with the n — 1 variables 
are proportional to those of variable a. It may thus be written 


(5), (30) he = ToC . 


It follows that the communality is (z) the correlation coefficient between 
scores on a and scores on another construct behavior a’ having exactly the 
same coefficients as a with the other n — 1 variables, i.e., 


(1) he = Tea’ » 

and (iz) the variance of a predicted from the remaining n — 1 variable- 
domains C,,, C,,--- , C,, that is, it is the squared multiple RF, 

(11) h2 = ee tatersete ° 


In this form it may be calculated by an electronic computer from the simul- 
taneous quadratic formula (21), in which the set of h values of the n — 1 
variables secured from approximation B, formula (29), initiate the iterative 
process. 

But the domains C, , C;,, --- , C, that predict the communalities of all 
n variables can always be grouped without loss of predictive power into k 
independent cluster or residual cluster dimensions (“factors’’) C; , C, , --- 
C,, , whence the predicted variance of a is the squared multiple R, 


(9) he = Re iidtasots . 


The computing form of (9), suitable for desk calculator or electronic computer 
operations, requires cumulating the k partial communalities: 


(7) he = rec, + Tacs - + rece = Nie + hea + 7+ + Nia . 


By the Key Cluster Method of factoring [12] the partial communalities are 
secured initially by approximation B, (29) or (30). After the first factoring 
the improved estimates of the communalities by (7) permit refactoring to 


’ 
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secure better estimates. Refactoring continues until convergence of the 
communalities. 

The predictor domains C, , C, , --- , C, in (11) can also, without loss of 
predictive power, be grouped into k’ oblique cluster domains, C; , Ci, --- , 
C,, , subject to the limiting conditions that k’ > k and that the k’ cluster 
domains lie in k space, that is, 


(44) he oa x. eCyCyy°**Ce 


The computing form of (44) is given in (46), which is generally sym ila 
with (21) but with fewer predictor variables. The practical limitation of this 
formulation is the need to know k. 

Approximations to the communality, useful especially as values to 
initiate iterations to the exact value, will approach this value to the degree 
that the approximation formulas employed meet the definition of the com- 
munality. By the definition expressed in (5) it follows that approximations 
B, C, D, , and E give quickly computed values close to the exact value 
according as the variables that comprise the reference cluster of the; variable 
have correlations congruent with it. By approximation F, the cumulated 
partial communalities by (7) or (9) resulting from only the first factoring 
require more work than approximations B, C, D, and E; in the Guilford 
example, values secured by key cluster factoring are considerably more 
exact than by the centroid. Approximation A, the squared multiple R, is 
clearly a biased estimate of (11), the predictor variables being the n — 1 
fallible measures rather than the required n — 1 variable-domains. The 
highest r and modified highest r are poor estimates, since by definition (1) 
there is no implication that the required perfectly congruent parallel variable 
a’ is, or should be, well represented by that observed variable with, which a 
correlates highest. 

The present account has been concerned with the logic and algebra of 
communalities. Three methods have been shown to give the exact values 
for an artificial matrix with known theoretical population communalities. 
We need to evaluate the three methods on empirical matrices. This: problem 
is more difficult because of lack of knowledge of the population values and 
of the need to take into account the sampling errors of the communalities. 
It would seem that to treat the communality as a squared modified R as 
given in (11) should simplify the problem of deriving this sampling error. 


REFERENCES 


{1} Cattell, R. B. Factor analysis. New York: Harper, 1952. 

[2] Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 

[3] Guttman, L. Multiple rectilinear prediction and the resolution into ———- 
Psychometrika, 1940, 5, 75-99. 

[4] Hotelling, H. Analysis of a complex of statistical variables into principal components. 
J. educ. Psychol., 1933, 24, 417-41, 498-520. 








260 PSYCHOMETRIKA 


[5] Kaiser, H. Solution for the communalities: a preliminary report. Contract Report 5, 
27 Sept. 1956, Univ. Calif., Contract No. AF 41(657)-76. 

[6] Spearman, C. The abilities of man. London: Macmillan, 1927. 

[7] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[8] Tryon, R. C. Cluster analysis. Ann Arbor: Edwards Bros., 1939. 

[9] Tryon, R. C. Identification of social areas by cluster analysis. Univ. Calif. Publ. 
Psychol., 1955, 8, No. 1, 1-100. Berkeley: Univ. Calif. Press. 

{10} Tryon, R. C. General dimensions of individual differences: Cluster analysis vs. mul- 
tiple-factor analysis. Amer. Psychol., 1956, 11, 479. (Title) 

{11] Tryon, R. C. Reliability and domain validity: Reformulation and historical critique. 
Psych. Bull., 1957, 54, 229-249. 

[12] Tryon, R. C. Cumulative communality cluster (CCC) analysis. Contract Report 7, 
8 Nov. 1956. Univ. Calif. Contract No. AF 41(657)-76. 

[13] Tryon, R. C. Domain sampling formulation of cluster and factor analysis. Contract 
Report 18, July 1957, Univ. Calif. Contract No. AF 41(657)-76. 

[14] Wrigley, C. F., Cherry, C. N., Lee, M. C., and McQuitty, L. L. Use of the square-root 
method to identify factors in the job performance of aircraft mechanics. Psychol. 
Monogr., 1956, 70, No. 23. (Whole No. 430.) 

[15] Wrigley, C. F. An empirical comparison of various methods for the estimation of 
communalities. Contract Report 1, 30 June 1956. Univ. Calif. Contract No. AF 
41(657)-76. 


Manuscript received 12/17/56 
Revised manuscript received 2/25/57 





























PSYCHOMETRIKA—VOL. 22, NO. 3 
SEPTEMBER, 1957 


AMOUNTS OF FIXATION AND DISCOVERY 
IN MAZE LEARNING BEHAVIOR 


Hersert A. Simon* 


CARNEGIE INSTITUTE OF TECHNOLOGY 


The proposed quantitative description of maze learning rests on the 
assumption that two independent processes are involved: (2) a discovery 
process based on trial-and-error search for the correct response, (77) a fixation 
process equivalent to that observed in serial learning. The model leads to 
predictions that are consistent with the available experimental data. In 
particular, the number of trials required for fixation is independent of the 
number of alternatives at each choice point (and hence independent of the 
number of bits of information contained in each correct response). 


To learn the correct path through a maze, a subject must first discover 
the path and then fixate it in memory. The distinction between the processes 
of discovery and fixation is well known in the psychological literature [4] 
but has not been much used for analyzing learning experiments quantitatively. 
In this paper, it is shown how amounts of discovery and fixation can be 
estimated from the structure of a maze learning task, how these amounts 
can be used to predict number of trials and number of errors to criterion 
and how an analysis of maze learning in these terms brings these experiments 
into relationship with classical experiments on the learning of lists of nonsense 
syllables. 


The Theoretical Model 


To run a maze successfully, the subject must make a correct sequence 
of responses, e.g., “left, left, up, right, left, up.’”? At each choice point, the 
response must be selected from a specified list of alternatives. There is a 
specified number of choice points. Hence, a particular maze may be character- 
ized by two parameters: n, the number of alternative paths, or possible 
responses, at each choice point; and L, the number of choice points, or length 
of the maze. As measures of learning only total errors to criterion EF and 
trials to criterion 7’ will be considered. 

It is postulated that the learning behavior involves two simple processes: 
one of discovery and one of fixation. It is convenient to discuss them in 
reverse order. 


*I am grateful to W. J. Brogden, G. A. Miller, R. F. Thompson, and J. Voss for their 
helpful comments on an earlier draft of this paper. 
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Fixation 

The learning may occur under either the correction or the non-correction 
method. Under the correction method, the subject makes a sequence of 
responses at each choice point until he makes the correct response; he is then 
informed that it was correct and proceeds to the next choice point. Under 
the non-correction method, if the subject’s first response is incorrect, he is 
told the correct response and then proceeds to the next choice point. Under 
both methods one correct response is reinforced on each trial at each choice 
point. When animals are used as subjects, and when the learning involves a 
physical maze, the correction method is ordinarily used. In experiments on 
human learning in verbal mazes, either the correction or the non-correction 
method may be used. 

Ignoring, for the moment, the activities of the subject in searching for 
the correct response (i.e., the discovery activities) and the range of alternative 
responses open to him, it is seen that the task of fixating the correct response 
in a verbal maze learning experiment is the same as the task of fixating the 
correct response in learning a series of syllables or digits. Assuming that 
these two processes are, indeed, the same, one is justified in using the ex- 
perimental findings on serial learning to predict the course of the fixation 
process in maze learning. This assumption will be validated by testing the 
predictions to which it leads, and by relating it to other recent findings on 
memory processes. 

Available data on serial learning ((3], p. 620, Fig. 8) show, for sequences 
of the lengths employed in maze learning experiments (Z no greater than 24), 
that the total number of trials to criterion increases monotonically with the 
length of the sequence: 


(1) T = f(L), with dT/dL > 0. 


T appears to increase proportionately with Z; within the range con- 
sidered, the departure from proportionality is not large. Accepting the 
evidence for proportional increase, (1) may be specialized to the linear 
relation 


(1’) T=bL, b>0. 


Both the general and special forms of the function will be used in what 
follows. 

An initial hypothesis is that these functions, obtained from the empirical 
data on fixation in serial learning, are applicable to the fixation process in 
maze learning. It may be remarked that b depends both on the ability of the 
subjects and the difficulty of the material to be learned. For nonsense syllables 
with low association value, values of b in the neighborhood of 1 are often 
reported; it is remarkable that values from studies carried out at widely 
different times are quite similar ([3], p. 620, Fig. 8). Whether or not this 
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relative constancy carries over to learning in verbal mazes will be discussed 
in a later section. 

Implicit in the hypothesis that the function (1) applies to the fixation 
process in maze learning is the very strong assumption that 7’ is independent 
of n. It is not at all obvious that the difficulty in fixating the correct response 
at a choice point in a maze should be independent of the number of alternative 
responses at that point. It would be plausible to assume that the difficulty 
of fixation would increase with the amount of information contained in a 
correct response. But, by definition, the amount of information in a response 
is proportional to the logarithm of the number of alternative responses— 
one bit for a choice between two alternatives, two bits for a choice between 
four alternatives, three for a choice between eight alternatives, and so on. 
If difficulty of fixation depended on amount of information, 7’ would depend 
on n. 

Regardless of the plausibility of the assumption that 7 is independent 
of n, it is made tacitly in the whole body of experimentation that has been 
carried out on serial learning. The grounds for this assertion are these: a 
subject, learning a series of nonsense syllables or digits, is at each step trying 
to select the correct response from some range of possible responses. But 
(with the exception of a recent experiment mentioned below) it has not 
been usual for the experimenter to specify for the subject the range of possible 
or admissible responses. The particular nonsense syllables used are selected 
from a much larger class upon which the subject could presumably draw in 
trying to choose the correct response. The size of this class is a possible 
source of variance in the fixation process that has not generally been con- 
trolled in the classical experiments. In the learning of verbal mazes, the 
number of alternatives at each branch point is made explicit, and hence the 
independence of 7' from n can be subjected to direct test. The experimental 
findings are consistent with the hypothesis of independence. 

The finding that the number of trials required to learn a sequence of 
syllables depends primarily on the number of syllables to be learned and not 
upon the number of bits of information per syllable is parallel with the data 
of Miller [5] on the span of immediate memory, recently corroborated directly 
by Miller and Smith [6] for rote memory. These results suggest the need of 
caution and sophistication in applying measurements from statistical in- 
formation theory to learning experiments. As Miller [6] has pointed out, 
information measurements appear to be directly applicable to certain ex- 
periments in perception and discrimination but not to memory span experi- 
ments. These findings provide additional justification for the absence of n 
from (1), above. 


Discovery 


For purposes of simplifying the development, assume (7) when the 
subject is at a particular choice point on a particular trial, he either knows 
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or does not know the correct response; (77) if the former, he gives the correct 
response at once; (777) if the latter, he tries responses at random until he hits 
on the correct one. It is well known, empirically, that assumption (722) is not 
literally correct, since subjects ordinarily try alternatives in a patterned way. 
However, as long as the pattern is (for the average over all subjects) un- 
correlated with the location of the correct response, results will be unchanged, 
and there may be no need for a more accurate but more complicated substitute 
for (7#2). 

Under the above assumptions, the expected number of errors FE; on a 
given trial will be proportional to the product of the average number of errors 
per random search, 3(n — 1), and the number of unlearned responses at the 


beginning of the trial, U,_, : 
(2) E, = 3(n — 1)U,;-, . 

Drawing again upon the empirical data that describe the fixation process 
in serial learning, some of the regularities in these data allow the derivation 
from (2) of an equation for FE, the total number of errors to criterion. In 
particular, it is assumed that the Kjersted-Robinson law (ef. [3], p. 619) 
applies to the fixation process in maze learning. This law asserts that the 
percentage of responses learned through the ¢th trial, (Z — U,)/Z, is a function 
only of ¢/7T, say: 

(3) (L — U)/L = glt/T). 

By the definition of Z as the sum of the EZ, , and by using (8) to eliminate 

U,_, from the right side of (2), 


T 4 T 
(4) B= DE = 2m — VL1 — g/T)] = 4m — DL De fl — oft/T)]. 
But, by a well-known theorem on homogeneous functions, 


T 1 
(5) / g(t/T) dt = T / gid) dX = KT, where K is a constant. 
( 0 


Since the integral of (5) is an approximation to the last term of the sum on 
the right of (4), then (4) may be rewritten approximately: 


(6) E = 3(1 — K)(n — DLT. 


Finally, substituting (1) or (1’) in (6): 


(7) E = a(n — 1)Lf(L) (a = a constant), or 
(7’) E = a'(n — 1)L’, respectively. 


From (6) it is apparent that the number of errors to criterion will vary 
proportionately with the number of alternatives at each choice point, more 
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precisely, with (n — 1), and with the product of the length of the maze by 
the number of trials. If the number of trials is, by (1’), proportional to the 
length, the number of errors, by (7’), will be proportional to the square of 
the length. All the quantities appearing in (1’), (6), and (7’) are observables, 
and hence the equations can be fitted to data on learning performance in 
mazes. In fitting (1’) and (7’), one degree of freedom corresponding to the 
constant of proportionality, b or a, is lost. 

From the data of Robinson and Darrow reported by Hovland ({[3], p. 
619, Table 1), one can make a numerical estimate of (1 — K) in (6), obtaining 
the value .41. Since this constant has only an empirical basis, it cannot be 
expected to be exact; in what follows the approximate value .4 will be used. 
If (1 — K) is estimated at .4, no degrees of freedom are lost in fitting (6). 


The Experimental Data 


The two principal bodies of evidence for testing the model are: (7) a 
series of experiments by Brogden and his associates [1, 2, 8] on human learning 
in verbal mazes of various lengths and numbers of branch points; (72) a 
series of experiments with animals by Scott and Henninger [7] using two- 
alternative mazes of various lengths. 


Number of trials to criterion 


All these data support equation (1’). Brogden and Schmidt [1, 2] obtain 
an average value for b of .75 for 16-unit mazes, and .83 for 24-unit mazes. 
Thompson [8] obtains an average b of .75 for 12-unit mazes. Since the fixation 
tasks in all three sets of experiments were of about the same difficulty, and 
since the subjects were drawn from the same population (volunteers from the 
introductory psychology course at the University of Wisconsin), the relative 
constancy of the values of b cannot be regarded as a mere artifact. For this 
reason it is justified to pool the data from all three sets of experiments, using 
a single averaged value of b. (On the other hand, Scott and Henninger [7] 
in experiments using a variety of maze designs and animal subjects obtained 
values for b ranging from .7 to 2.) Finally, it is noted that Thompson [8] 
compared the correction and non-correction methods and found no significant 
difference between them in the relation of length of maze to trials to criterion. 

That the number of trials is independent of number of alternatives at 
each choice point, as postulated in (1), is shown by the data of Brogden and 
Schmidt [1, 2] and of Thompson [8]. In all three sets of experiments, the 
average number of trials to criterion was not significantly related to number 
of alternatives, except that the average number of trials was usually sig- 
nificantly low for mazes having only two alternatives at each choice point. 
Hence, the assumption of independence holds strictly only for n greater than 
2. Thompson reports data for a 12-alternative maze with n ranging from 2 
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to 6; Brogden and Schmidt report data for a 16-alternative maze with n 
ranging from 2 to 8, and for a 24-alternative maze with n ranging from 2 
to 12. 

That the number of trials varies proportionately with maze length is 
shown by the near-equality in values of b obtained with mazes of similar 
design of lengths 12, 16, and 24. Table 1 provides additional evidence in the 
form of a comparison of the actual with estimated values of 7 for mazes of 
various lengths in four of the designs studied by Scott and Henninger [7]. 


Number of errors to criterion 


Having tested that part of the theory which relates to number of trials, 
the data on errors is now to be considered. Brogden has made available to 
the writer the data on numbers of errors and trials to criterion for the 210 
individual subjects in the experiments reported in [1] and [2]. Using the 
estimated value of .4 for (1 — K), the experimentally determined values 
for n and L, and the observed values of 7’, call them 7’; , E is estimated for 
each subject by (6). Let #; be the observed values of H, and H* be the values 
estimated from (6). It has been pointed out above that no degrees of freedom 
are lost in this process, since all the parameters in (6) are estimated in- 
dependently of the observed £; . 

The mean £; for all 210 subjects is 376.7, and the standard deviation is 
347.8. Designating the error of estimate, d; = EH; — E* , the arithmetic 
mean error (the mean of d;) is only —12.48. This implies that the least 
squares estimate of (1 — K) is about .39, instead of the .4 estimated from 
the serial learning data; for the mean of the FE; , 376.7, is about 39/40 of the 
mean of the £* , 389.2. 

More remarkable, the estimates from (6) account for 93.7 per cent of 
the variance in the EZ; . The variance of the EL; is 5,334 X 10°; the variance 
of the d; is 335 X 10°. The latter is only 6.3 per cent of the former. The 
coefficient of variation, the ratio of the standard deviation of the d; to the 
mean of the £, , is .23. 

The entire theory can be subjected to a severe test by estimating EH 
in the experiments of Brogden and Schmidt and of Thompson from (7’), 
employing a single average value of a’ obtained from the whole set of experi- 
ments. Estimating b = .75 from the experimental data, a’ = 3(.4) (.75) = .15. 
Substituting in (7’) the values 12, 16, and 24 for LZ gives the corresponding 
equations for HZ: H(12) = 21.6 (n — 1); E(16) = 38.4 (nm — 1); £(24) = 
86.6 (n — 1). The least squares regressions reported by Thompson [8] and by 
Brogden and Schmidt [1, 2] are: #(12) = 31.1 (n — 1) — 24.7; E(16) = 
34.1 (n — 1) + 21.3; H(24) = 93.4 (n — 1) + 28.0. Considering that the 
only degree of freedom lost was that used to estimate b, these equations must 
be regarded as a remarkably close fit. Moreover, if the regression line is 
constrained to pass through the origin, as required by the theory, the actual 
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data would give E(12) = 23 (n — 1); E(16) = 37 (n — 1); and E(24) = 96 
(n — 1). This provides, on the whole, an even closer fit. 

The test of (7’) may also be made with the data for 210 individual 
subjects that were used to test (6). Let E% be the estimate of EF; from (7’), 
taking b = .8. Then the mean E’ = 350.1, which is about 5 per cent below 
the observed mean. (Stated otherwise, a least squares estimate of b would 
give a value of about .84.) The variance of the (Ei — £;) is 1,039 X 10°, 
or 19.5 per cent of the variance of the EZ; . Hence, the estimate from (7’) 
accounts for just over 80 per cent of the variance, as compared with the 
nearly 95 per cent accounted for by estimating the EZ; from (6). The additional 
source of estimation error lies, of course, in the deviations of the actual 
values of the 7’; from the values, 7’ , estimated by (1’). 

In their first experiment, Brogden and Schmidt ([1], p. 239) adduce one 
piece of direct evidence for (2), which is required for the derivation of (6). 
Since U, , the number of unlearned responses at the beginning of the first 
trial, is equal to L, from (2) 


(8) E, = 3m — 1)U, = 3 — DL. 
Brogden and Schmidt find that the data fit (8) within sampling error. 
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INFORMATION FILTERS* 
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When an information processing system is faced with an excess of input 
information the task of selecting the items which are to get immediate proc- 
essing is frequently assigned to a human being. A quantitative measure of 
the extent to which a man avoids random activity during such filtering 
operations is derived in terms of two parameters (normalized overload and 
correct proportion of selections) which are determined from experimentally 
available quantities. This coherence measure may be used for studies of 
random behavior, comparison of rules for selecting items, and perhaps pre- 
diction of human performance at filtering tasks. 


In this modern age a situation frequently occurs in which an information 
processing system—machine and/or human—is presented with information 
at a rate greater than it can handle. More calls may be placed at a telephone 
switchboard than there are outgoing lines; the rate of plane arrivals at an 
airfield may be greater than the tower operator can handle; a radar may 
display more returns than the tracker can track. In such situations, some 
items are selected for immediate processing while others are filtered out, 
being either rejected or stored for later processing. 

Such filtering is often performed by a human being. If he acts according 
to a single complete set of unambiguous rules, he can be said to be operating 
coherently; but if he makes some selections by chance (casual selection) or 
alternates between several sets of rules by chance (desulatory selection), he 
can be said to be operating at least in part randomly. A thorough investigation 
of such filtering operations should include not only the human filter’s capacity 
and accuracy but also the extent to which his activities are random. 

This paper presents a direct quantitative measure of the extent to which 
a human filter avoids random activity in the selection of informational 
items during periods of overload. In the course of processing information, a 
human may perform various transformations, integrations, and other oper- 
ations, but the only activity considered here is his selection of the items to 
be processed further. The human filter will be represented schematically 
by a box (Fig. 1). Transfer characteristics are defined within the box, but 


*The author is particularly indebted to Dr. Harold Glaser, of the U. 8S. Naval Re- 
search Laboratory, for many helpful suggestions during the development of this measure. 
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Schematic Representation of a Human Information Filter 
S refers to inputs or stimuli 
R refers to outputs or responses 
T refers to transfer channels relating 
the inputs to the outputs 


these are solely for the purpose of relating the inputs and outputs; such 
characteristics are not necessarily analogous to the actual mental processes 
involved in selection, nor are they intended to be. 

Several measures of the filter performance may be obtained from the 
inputs, transfer characteristics, and outputs. The simplest would be the 
ratio of correct selections to either total number of selections or total number 
of inputs, but these do not take random activity into account. Quastler and 
Buckley [3] employed similar functions in a measure, based on information 
theory, which provides the information one receives by being informed of the 
filter’s specific selections. Errors are treated as equivocation, so that when 
the filter is operating in a completely random manner no information is 
obtained. It was this work which first interested the author in the human 
filtering problem. However, a direct measure of the extent of randomness, 
or lack of it, would be more convenient in selecting personnel for filtering 
activity and establishing proper priority rules for them to use. 


Derivation of the Measure 


In this derivation capital letters are used as labels for stimuli, transfer 
channels, and responses (Fig. 1); corresponding lower case letters refer to 
the number of items in these groups or passing through these channels. The 
human filter is confronted with a total of s informational items, or stimuli, 
but when s exceeds the number of items he can handle in the time available 
he selects only r, of them. His task is to divide the total set (S), containing 
s items, into two sub-sets according to a complete program of unambiguous 
rules so that the sub-set he selects (R,) contains items each having higher 
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priority than any in the sub-set which he rejects (R,). However, the items 
comprising his selections may or may not be those which should have been 
selected, according to the rules. S, will be defined to be the theoretically 
correct. sub-set of higher priority stimuli which he should select, so that 
S, = 1, in number, and the remaining stimuli will be grouped into S, . Since 
the filter may make any number of selections between zero and s, the rules 
of selection must be such that all items in S can be ranked according to 
priority so that exactly s, items can always be found such that each of them 
is of higher priority than any of the remaining items. However, once the 
magnitude of s, is determined from r, and the specific constituents of S, 
have thereby been established, their relative priority is of no further concern. 

If the human filter is operating completely coherently, i.e., exactly 
according to the rules, his responses R, are made only from the sub-set of 
correct stimuli S,—he is selecting only through the transfer channel 7, . In 
this case ¢,, = 7’. . His rejections R, comprise the other sub-set S, , so that 
he is rejecting only through the transfer channel 7’, . 

However, if he makes errors in his filtering activities, some of his selec- 
tions must be made from the lower sub-set of stimuli S, . Thus, in addition 
to his correct selections through transfer channel 7',, , he is making some 
erroneous selections through channel 7’,, ; the items of the sub-set S, which 
he does not include in R, because of incorrect selections must be rejected 
through transfer channel T,,, . 

Suppose the human filter makes some of the selections according to the 
rules and then eitber guesses at the rest or reverts at random to an alternative 
set of rules. The desired measure should indicate what proportion of the 
responses were made coherently. This is similar to the problem faced in 
compensating for guessing in multiple-choice tests, and the same method 
of solution can be adapted to develop a proper measure here. In general, the 
procedure is to (7) count the number of erroneous selections, (77) compute 
the number of guesses from which that number of errors was most likely to 
result, (777) subtract the number wrong from the probable number of guesses 
to obtain the probable number of right guesses, and (7v) subtract the probable 
number of right guesses from the total number right to obtain the score. In 
multiple-choice testing this results in the formula 


(1) S=R — [W/m - ))], 


in which S is the corrected score, & is the number right, W is the number 
wrong, and n is the number of choices available for each answer ({1], p. 518). 

However, in multiple-choice tests the selection of an answer to a question 
is essentially independent of the preceding selections, whereas each time 
the human filter selects an item the relative number of available items for 
right and wrong guesses changes and therefore the probability of his in- 
correctly guessing the next selection changes. In such a situation (selection 
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without replacement) the probability that out of n guesses exactly w will be 
wrong is given by a hypergeometric distribution ((2], pp. 182-183) from 
which the expected number wrong E(w) is found to be 


(2) E(w) = ns,/(n + s,). 


(The assumption is made here that no wrong selections are made knowingly, 
so that when the human filter starts guessing all the s, lower priority items 
are available, originally, for wrong guesses.) Now the difference between 
the total number of guesses and the expected number of wrong guesses must 
be the expected number of right guesses E(a); that is, 


(3) E(a) = n — E(w) = E*(w)/[s, — E(w)]. 


It remains only to subtract the expected number of right guesses H(a) from 
the total number of correct selections ¢,, to obtain the number of correct 
selections which were made coherently, i.e., by applying the rules and not 
guessing. For convenience in comparing different experiments the result 
should be normalized to the number of responses. Thus the resultant measure 
is 
(4) C = [tea — E(a)]/r. . 
Just as with multiple-choice test scoring, the best likelihood estimate of the 
theoretically expected number wrong E(w) is the actual number wrong f,, . 
Making this change in (3) and substituting the result in (4) yields 
: a tha 
6) a Te on Ta(S, — tea) 
Let L = s/r, and A = t,./r. . By algebraic manipulation of the filter-param- 
eters (Fig. 1) it can be shown that 
3 LA -—1 
(6) C bao 
The coherence measure C’ is defined to be the normalized difference 
between the total number of correct selections and that portion of the correct 
selections which was obtained by guessing; thus C as given in (6) cannot be 
negative. Setting (6) equal to zero and solving for A shows that A = 1/L for 
completely random activity on the part of the filter. But A is simply the 
proportion of selections which are correct; this can go to zero experimentally. 
The significance of values of A < 1/Z will be discussed in the next section, 
but obviously the measure must be modified to include such situations. 
This can be accomplished by making a “mirror image’”’ of the actual A about 
the random ‘‘axis” and then handling this image like any A > 1/L. 
Consider a unit interval representing the number of selections made. 
Then for completely coherent activity, A equals one equals the unit interval; 
for completely random activity, A is equal to some sub-interval 1/L. Now for 
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A < 1/L a corresponding A’ is found such that the variation of A below 
1/L is the same proportion of the interval (0, 1/Z) that the variation of A’ 
above 1/L is of the remaining interval (1/Z, 1). This image value is 

(7) A’=1+4A-—LA. 


Substituting the expression for A’ from (7) in place of A in (6) and changing 

the sign to indicate that the filter response is poorer than one would expect 

by chance yields 

im LA -—1+L(1 — LA) 
b+ A= 240 ba 

Here C' varies between 0 and —1, and C as given in (6), which is used for 


A > 1/L, varies between 0 and +1. 


Ak 37. 





(8) C= 


Discussion of the Measure and its Use 


As derived in the preceding section, the coherence measure is given by 





LA —1 
(6) C= bees Fi A>1/L, and 
(8) By LA —1+L(1 — LA) A<IL, 


L+A-—2+4(1—-— LA)’ 
where L is the overload into which the filter is working and is equal to s/r, , 
and A is the proportion of selections which are correct, or é,,/7. . Thus the 
parameters L and A are determined from three experimentally available 
quantities; s is the total number of items available for filtering, 7, is the 
total number of selections, and t,, is the number of correct selections. 

Either the actual load s or the normalized overload L is usually the 
independent or control variable; either the raw or the normalized number of 
correct selections (/,, or A) is the dependent variable of interest. The co- 
herence measure is simply a corrected value of A which takes random guessing 
into account. Since the probability of a wrong guess is a mathematical 
function of the overload, L appears in the measure. However, C is not a 
measure of the relationship between L and A. C varies between plus and 
minus one only because it is normalized to allow comparisons among several 
experiments and not because it is a coefficient of correlation or association. 

Coherence may be determined from experimental measurements, and 
thus the statistical behavior of C is of interest. Since it is a corrected value 
of A, the probabilities of individual values of C = C, are equal to the prob- 
abilities of corresponding experimental values of A = A, from which they 
were computed according to (6) or (8). That is, 


(9) Pr {C = C,} = Pr {A ‘= A,}. 


Thus it follows directly ((2], p. 172) that the mean and the variation of C 
can be computed in the usual way, even though the probabilities of individual 
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values of C are not measured directly. The mean and the variation exist for 
all L > 1, and when L = 1 there is no overload and thus no filtering. 

A negative value of coherence would appear to be logically inconsistent. 
However, it can occur for two reasons. The number of right guesses is de- 
termined from a computation in which the actual number of errors is taken 
as the best likelihood estimate of the theoretically expected number of 
errors. This is certainly the best estimate that can be made, but fluctuations 
about this value must be expected. Thus, if the human filter makes a dis- 
proportionately large number of bad guesses, the measure can actually go 
negative. Such fluctuations should be cancelled in the long run by the high 
values of C resulting when the filter makes a disproportionately large number 
of good guesses. 

Of more interest is the situation in which C is consistently negative. 
This can mean only that the average number of errors is greater than one 
would expect by chance, and, therefore, that the filter is actually using a set 
of rules differing at least partially from the prescribed set. Although a negative 
C cannot by itself indicate what set of rules the filter is using, its magnitude 
indicates the degree of difference—that is, a larger negative C implies a 
greater discrepancy from the prescribed set of rules. 

Applications 

The measure of coherence can be used for psychological studies of 
random behavior as a function of an overload of information, or as a function 
of other variables in the presence of an overload. But it may have more 
practical applications as well. For example, where several methods exist for 
determining priorities among items of input information, a plot of coherence 
versus overload for each method will reveal which method the subjects 
adhere to best and also how the methods compare at various levels of over- 
load. Also, it is quite possible that some individuals tend to be more coherent 
generally than others; if this is true a simple test designed to determine a man’s 
coherence level might permit prediction of his subsequent performance at 
filtering tasks. 

In fact, anytime one is concerned with adherence to rules or with partially 
random activity in overloaded information handling systems, the coherence 
measure should be a useful quantitative tool. 
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This study reveals the usefulness of multiple correlation techniques in 
estimating the relative importance of different aspects of a tracking task in 
the operator’s tracking behavior. The technique is applied to a compensatory 
tracking task with a position control. 


This study is designed to ascertain the adequacy of multiple correlation 
techniques in estimating the relative importance of different aspects of a 
tracking situation to the operator’s tracking behavior. The tracking task 
chosen for analysis is a compensatory task in which the operator is required 
to manipulate a joy stick so as to keep an externally driven target dot centered 
on a hairline reference located on the face of a cathode ray tube. A direct 
position control is used so that the tracking error (i.e., the displacement of 
the dot from the hairline) is equal to the difference between the course dis- 
placement and the control displacement. 

In this task the operator might be capable of extracting usable infor- 
mation from the visual display in terms of the magnitude and direction of 
the displacement of the dot, the speed and direction in which the dot is 
moving, and possibly even changes in the speed. That is to say, the visual 
stimuli to be evaluated are the instantaneous values of the error e and its 
first and second derivatives é and é. In addition, the operator may base his 
responses partly on information contained in the several components of the 
stick motion, i.e., instantaneous stick displacement FR, and its first and 
second derivatives R and &. It is further assumed that each of these six 
variables would have its influence on performance after some delay, roughly 
analogous to a “reaction time” for each variable in question. These delay 
times, which must be revealed by the analysis, are left free to assume different 
values for each of the assumed variables. 


*Now with General Electric Advanced Electronics Center. 

{The authors wish to express their appreciation of the many persons who have so 
generously offered encouragement and advice. Particularly helpful were S. F. George and 
H. Glaser of the U. S. Naval Research Laboratory and W. J. McGill of the Massachusetts 
Institute of Technology. The authors are also deeply indebted to Miss Jean B. Henson, 
who performed the many long and laborious statistical computations, for which she de- 
serves more than this footnote. 
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The specific measure taken as the performance criterion, or dependent 
variable, is the acceleration pattern of the stick movements. Intuitively, it 
seems that a response should be a change from one state to another. Use of 
the acceleration measure assumes that the operator maintains a rate of move- 
ment until an appreciation of the stimulus situation leads him to generate a 
different rate or direction of movement. Birmingham and Taylor [1], dis- 
cussing the physics involved, suggest that a measure related to the instan- 
taneous application of force might reasonably represent the operator’s 
output. They, too, conclude that with the combined mass of the arm and 
stick remaining constant, acceleration, or the second derivative of stick 
position with respect to time, would be the desired measure. 

The problem is to ascertain which variables actually do influence the 
operator’s performance and to ascertain the relative weight for each of the 
variables. More conventional experimental techniques would require holding 
all variables constant except one and then observing changes in performance 
resulting from controlled manipulation of the isolated variable. In a continuous 
tracking task the several independent stimulus variables are necessarily 
correlated; hence experimental isolation is not possible. Multiple correlation 
techniques, however, circumvent this difficulty, permitting statistical isolation 
even when experimental isolation is impossible (cf. [3] and [7]). This is true 
since a partial regression coefficient measures the regression of an independent 
variable on the dependent variable with the influence of other independent 
variables removed. 

The use of this index can best be illustrated by returning to the original 
problem posed for this study: to relate the performance measure R, to each 
of the stimulus variables e, é, é, R, R, and # in a formulation which will 
indicate the relative importance of each. Two analyses will be performed: one 
using only the display variables, and the second using both display and 
stick-motion variables. The specific statements of these relationships take 
the forms: 


(1) R, = Meech, + bé,-», + Cé,-y, 
(2) R, = aes, + dbs», + Cé:-r, + ARin, + fRin, + gk, . 


Equations (1) and (2) are recognizable as linear differential equations 
and state that at time ¢ the operator generates an instantaneous acceleration 
proportional to the weighted sum of the error and its first two derivatives, 
or, the error and stick displacement and their first and second derivatives, 
each taken at its respective delay time (¢ — \). As far as the correlational 
analysis is concerned, the fact that these are differential equations is irrelevant. 
Generalized, they are equivalent in form to the multiple regression equation: 


(3) X; oe Biz-3++enX2 + Biz -2++ en 3 + oie + Bin-23+++(n—1) Xn ) 


where X, is the dependent or criterion variable, and X, , X;,--+ , X,, are 
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the independent or predictor variables. The beta coefficients are the desired 
partial regression coefficients; they represent the net weight to be assigned to 
each predictor when the simultaneous effects of all the other predictors are 
held constant. If (1) and (2) are interpreted as multiple regression equations, 
the coefficients a, b, c, etc., regarded as partial regression coefficients, will 
represent the degree to which F& is dependent upon each variable in question 
with all the other variables in that equation held constant. That is, these 
coefficients will indicate the relative weight of each variable in determining 
the response. 

A check on the adequacy of any particular solution is afforded by the 
multiple correlation coefficient, which can be calculated for each solution. 
This index indicates the correlation between the observed values of the 
dependent variable and the predicted values obtained from the multiple 
regression equation, thereby providing a measure of the precision of the 
predictions. For example, if the multiple correlation for (2) should not be 
significantly higher than the multiple correlation for (1), this would indicate 
that only the display variables were active in determining the responses. 
But if the multiple correlation for (2) should be higher than that for (1) 
this would indicate that both display and movement variables are of im- 
portance in determining the responses. And if, in either case, the multiple 
correlation coefficient is small this would indicate that neither formulation 
is adequate and that truly critical variables probably had not been included. 

There remains the matter of determining the appropriate lag associated 
with each independent variable—this presents a major problem in the 
evaluation of the above equations. These lags are found by first calculating 
the correlation functions between R and each independent variable e, é, 
etc., lagged over the period in which significant relationships are at all likely 
to be found. The spuriously high correlations at zero lag are ignored. The 
interval at which the maximum correlation between the dependent variable 
and the given independent variable occurs is selected as the appropriate lag 
for that predictor. An alternative procedure, which would avoid this some- 
what arbitrary selection of lags, involves trying all combinations of lags and 
accepting that combination resulting in the largest multiple correlation 
coefficient. Calculation of such solutions would have been a prodigious task 
quite beyond the practical limits of the experiment. The few additional 
likely combinations which were tried yielded lower multiple correlations, 
so the procedure of using lags based on the highest correlations was adopted. 


Method 
Apparatus 
A one-dimensional tracking system with auxiliary multi-channel record- 
ing equipment was set up on electronic analog computing equipment as 
outlined in the block diagram in Fig. 1. 
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Simplified Block Diagram of Apparatus 
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The course input consisted of two oscillator-generated sine waves with 
periods of 5.12 sec. and 2.75 sec. added together in an amplitude ratio of 
2.75 to 1. The error signal was displayed on the horizontal axis of a 5-in. 
cathode ray tube mounted directly in front of the subject at a viewing distance 
of approximately 2 ft. The subject tracked by moving a horizontally mounted, 
moderately stiff, joy stick from left to right in a plane parallel to the desk 
which held the equipment. A 1° displacement of the stick produced a dis- 
placement of the error dot subtending a visual angle of .052°. Stick displace- 
ment was transduced into voltage output by use of a vacuum tube strain 
gauge, RCA 5734. Simultaneous recordings of e, é, é, R, R, and R were taken 
with a six-channel Brush polygraph run at a paper speed of 1 cm./.24 sec. 


Procedure 


Two naval enlisted men served as subjects and were given 22 sessions 
of five 1-min. trials over the course of seven days. An integrated error score 
was obtained for each trial as a gross measure of performance. Polygraph 
records were taken only during the last 30 sec. of the trials composing sessions 
21 and 22. By this time the learning curves appeared to have reached an 
asymptote. 

A representative record, free from artifacts, was selected for each subject 
and submitted to analysis. 


Analysis of Records 


Corresponding sections of the two records were read at intervals equiv- 
alent to 0.06 sec. Two independent readings were made and collated into a 
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reasonably accurate final tabulation of the six variab!es at each of 310 points 
covering 18.9 sec. of tracking. A separate analysis was made for each subject. 

Correlation functions (Pearson product moment correlations) were then 
computed between each variable and every other variable over 25 lags of 
0.06 sec. or a total of 1.5 sec. The computations were programed on the 
NAREC digital computer. 


Results 


A sample tracking record of the six variables for S, is presented in Fig. 
2. The record for the other S had much the same appearance and does not 
reveal any obvious differences between the two Ss. However, the integrated 
error scores showed that S, was consistently better than S, . The correlation 
functions are also somewhat different for the two Ss as can be seen in Fig. 3, 
where the correlations between # and each predictor are presented.* The 
primary difference seems to be in the nature of the dominant periodicities 
of the correlograms. Otherwise the corresponding functions seem remarkably 
similar, at least up to lags of about 0.75 sec., the interval of maximum interest. 
In both cases the highest correlations are observed between R, and é at only 
slightly different lags. Any given point on these curves must exceed an 
absolute value of .154 to reach the 1-per cent level of significance; due to the 
large number of correlations involved, a more stringent standard should be 
set. 

From these functions, particular raw correlation values and their lags 
were selected for use in evaluating equations (1) and (2). Since the tracker 
must generate responses opposite in direction to the displayed error in order 
to keep the dot on the hairline, the coefficients for e, é, and é in these multiple 
regression equations must be negative. There were no known a priori re- 
strictions on the signs of movement variables, but, as it turned out, the 
highest correlations for these predictors were also negative. This being so, 
the highest negative correlations between R, and each predictor, together 
with the lags at which these correlations occurred, were selected for use in 
the analyses. Inspection of the scatter plots for these raw correlations sug- 
gested that the assumption of linear regression was met. These raw correlations 
were then processed by the Doolittle method [cf. 7] to solve the constants for 
(1) and (2). 

Entering the constants in (1) for S, yielded 


(4) R, = —.095e — .621é — .0176, 


*The NAREC digital computer. provided intercorrelations for all combinations of 
the six variables each at 26 lags ranging from 0.00 to 1.50 sec. A complete tabulation of 
these intercorrelations has been deposited with the American Documentation Institute. 
Order Document No. 5206 from the ADI Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., remitting in advance $1.75 for 35 mm. 
microfilm or $2.50 for 6 X 8 in. photocopies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 
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FIGURE 2 
A Typical Portion of the Tracking Record for S, 
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FIGurRE 3 
Plots of raw correlations between each independent variable and the dependent variable 
(R.) as a function of lag time. The broken line represents S,’s correlations and the solid 
line represents S,’s correlations. 
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where the lags for the respective e’s are —.06, —.24, and —.36 sec. The 
corresponding result for S, is 


(5) R, = — .192e — .455é — .0146, 


where the lags for the respective e’s are —.06, —.18, and —.24 sec. The 
multiple correlation coefficients are .675 for S, and .566 for S, . The amount 
of variance in R, accounted for in these multiple correlations is about 46 
per cent and 32 per cent, respectively. The predictions are considerably 
better than chance but still represent rather crude approximations. 

The data presented in Tables 1 and 2 show that in the equation for 
S, only the partial regression coefficient for é is significantly greater than 


TABLE 1 


Results of the Partial Correlation Analysis for S; 
Using Only the Display Variables 






































Variable | L Raw Regression | Standard t Significance} Partial 
ariable | “8 | Correlation | Coefficient (s)| Error of 8 Level of 6 |Correlation* 
e . 06 -. 374 -. 095 . 050 1.919 - -.113 
é . 24 -. 675 -. 621 . 054 11.500 . 01 -. 567 
é . 36 -. 282 -.017 . 049 0. 349 - -.010 

TABLE 2 


Results of the Partial Correlation Analysis for Sz 
Using Only the Display Variables 











Variable | L: Raw Regression Standard t Significance Partial 
ariable | 448 | Correlation| Coefficient (g)| Error of B Level of 8 |Correlation* 
e . 06 -. 384 -. 192 . 054 3. 529 -O1 -. 207 
é -18 -. 543 -. 455 060 7.533 .01 -.411 
é . 24 -. 234 -. 014 055 0. 253 - -.017 
































*In Tables 1, 2, 3, and 4 partial correlation as well as partial regression coefficients are 
presented because many readers may be more familiar with the correlation index. In 
making statements about the dependency of the criterion variables on the predictor 
variables the regression coefficients are preferable (cf. 3), although here the correla- 
tion and regression coefficients are nearly identical, as one would expect, and should 
offer no problem of mterpretation. 


zero. In the formulation for S, the coefficients for e and é are both significantly 
greater than zero. Furthermore, of the total amount of variance accounted 
for by these equations, in the case of S, about 85 per cent is attributable to 
variations in é, and in the case of S, , 69 per cent is attributable to é and 
only 29 per cent to e. 

Solutions for (2), where both display and movement variables are 
included, yielded for S, 


(6) R, = —.579e — .116é — .054é — .335R — .450R — .046R 
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where the lags for the respective e’s are —.06, —.24, and —.36 sec., and the 
lags for the respective R’s are —.06, —.24, and —.36 sec. For S, 

(7) R, = —.498e — .182é — .014é — .352R — .355R — .026R 

where the lags for the respective e’s are —.06, —.18, and —.24 sec., and the 
lags for the respective R’s are —.06, —.24, and —.24 sec. The multiple 
correlations when all predictors are used are .757 for S, and .722 for S, . The 
amount of variance accounted for is 57 per cent and 52 per cent, respectively. 
An F test [cf. 6] reveals that these multiple correlations are significantly 
larger (p < .001) than those obtained by using only the display variables 
and indicate that the six variable equations should be regarded as closer 
approximations to an adequate description of the factors influencing the 
uperator’s tracking performance. 

The most striking feature of these equations is the large shift in the 
regression coefficients for e and ¢ resulting from the inclusion of the movement 
variables. e is the single most important determiner of R, , accounting for 
37 per cent of the predictable R, variation in the case of S, and 35 per cent 
in the case of S, . On the other hand, é is a negligible factor in the equation 
tor S, , and in the equation for S, it accounts for only 13 per cent of the total 
predictable R, variance, even though the regression coefficient is reliably 
greater than zero (see Tables 3 and 4). In both cases, R and & now assume 
considerable importance. The amount of predictable variation attributable 
to the R factor is 21 per cent for S, and 25 per cent for S, . Comparable figures 
for R are 28 per cent for S, and 25 per cent for S, . Thus, in this tracking task 
at least, the acceleration pattern of the control movements seems to be 
determined largely by e, R, R, and possibly to a lesser extent by é. The 
adequacy of this formulation is shown graphically in Fig. 4, where the pre- 
dicted points are plotted against two samples of actual tracking record for 
S, . The greatest error appears to be in a failure to predict a high frequency 
component of the response, although the fit to a smoothed record would 
probably be quite good. 


Discussion 


The results indicate the degree to which the acceleration pattern of the 
control movements can be predicted assuming known values of the other 
variables at some time previous, when these variables are taken singly or 
in combination according to some assumed first-order linear scheme. 

The individual correlations show that if a prediction is to be made from 
only a single variable, the best results can be obtained from values of error 
velocity é taken at a lag of about .20 sec. The solutions of (1) indicate that 
only slightly better predictions can be made using all three error measures, 
but that é still carries the most weight. The regression equations including 
both e and R measures give considerably better predictions and indicate 





284 PSYCHOMETRIKA 


TABLE 3 


Results of the Partial Correlation Analysis for S: Using Both 
the Display and the Control Movement Variables 









































Poriittel 140 octettidea loom th neveatel  {toetae josie 
e |.06| -.374 -.579 .078 | 7.423 01 -. 406 
é |.24] -.675 -. 116 .081 | 1. 429 - -. 089 
é |.36| -.282 -. 054 .058 |0.931 - -. 059 
R |.06| -.361 -. 335 .051 | 6.569 01 -. 347 
R |.24] -.298 -. 450 .072 | 6.276 01 -. 352 
R |.36] -.226 -. 046 .068 | 0.672 - -.051 
TABLE 4 


Results of the Partial Correlation Analysis for S, Using Both 
the Display and the Control Movement Variables 











cee bed ee Eee RE ee 
e  |.06| -.384 -. 498 . 053 9.396| .01 -. 478 
e .18 | -.543 -. 182 . 067 2.716] .01 -. 157 
e |.24| -.234 -. 014 . 074 . 189 - -. 021 
R |.06| -.295 -. 352 . 054 6.518} .01 -. 361 
R |.24] -.327 -. 355 . 058 .355| .01 -. 389 
R |.24| -.271 -. 026 . 096 . 026 - -. 020 
































shifts in the relative weightings given to the e variables. It is now found that 
R, is most heavily dependent upon the error .06 sec. earlier, together with the 
simultaneous position of the stick, also .06 sec. earlier, and its velocity, 
about .24 sec. earlier. 

The multiple correlations found in this study, .76 and .72, are rather 
high. Even so, a sizable amount of residual variance remains to be explained 
either as systematic variability due to variables not included in the study 
or as random measurement error variance. Certainly measurement error 
is expected from a number of sources. Some variable error is attributable to 
the recording equipment. Also, analog computers are subject to drift and are 
particularly noisy when performing double differentiations. Needless to say, 
precautions were taken to keep this source of variation as small as possible. 
Undoubtedly some error was involved in reading the records. This factor 
was checked by re-reading one hundred points on the R and é records for 
S, , calculating reliability coefficients, and determining the theoretical 
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Figure 4 
Two typical samples of S,’s R record with points predicted by the multiple regression 
equation using both display and control movement variables. 


correlation between R, and é (with lag —.06) corrected for attenuation due 
to measurement errors [ef. 7]. The reliability coefficients were .986 for R 
and .994 for é, and the raw correlation between R, and é (with lag —.06), 
had it been corrected for attenuation, would have changed from —.543 to 
—.548. Although the correction is low, it nevertheless illustrates that some 
small part of the residual variance may be accounted for by errors in record 
reading. A related non-systematic source of error was introduced by the .06 
sec. grain used in quantizing the continuous records and in lagging the 
correlations. The point of maximum correlation between any two variables 
is estimated to within .03 sec. This factor may have affected slightly not only 
the size of the maximum correlations but also the choice of the lags and the 
values of the intercorrelations. The net effect may have been to lower some- 
what the size of the partial and multiple correlations. Finally, the moment- 
to-moment variation of the subject’s behavior, a characteristic to be expected 
of all human behavior, constituted still another source of random variance. 
These several sources of variance would generally tend to lower the raw, 
partial, and multiple correlations, although they should have little influence 
on the relative importance assigned to each variable. It is not possible to 
determine whether or not the above sources are responsible for all of the 
residual variance. In analyses such as these a possibility always remains that 
some additional variable may be active in determining the response. For 
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example, comparison of the predicted and obtained points in Fig. 4 suggests 
that a high frequency component of R, , approximately 10 cycles per sec., 
is not being predicted. These frequencies are about those of normal hand 
tremor. Effectively, the tremor, if indeed this is the source of these high 
frequencies, has been treated as a random noise source and no attempt has 
been made to account for it by factors in the multiple regression equations. 
At any rate this is a variable which, though possibly a significant source of 
variance, would not be expected to alter the relative influences of the other 
variables. 

The relative weights obtained when the R variables are included have 
considerable intuitive appeal. It seems quite reasonable that the operator’s 
response should depend primarily upon the displacement of the dot only 
when movements already under way are taken into account. When only the 
display variables are included in the analysis, the displacement appears to 
have negligible influence and the velocity of the dot is given a larger weighting. 
This is due to the fact that normally é is highly redundant with the R variables; 
33 per cent of the variance in é is common to the RF variables for S, , and for 
S, the figure is 43 per cent. Including the response variables in the analysis 
allows this redundancy to be partialled out. It is revealed that the major 
source of display information is the displacement, and that this information 
is of use only in combination with response information from R and R. 

The maximum correlation of R, with e and with FR occurs in both cases 
at .06 sec., which is extremely short compared to the typical “‘choice” reaction 
times [8]. It may be that a continuous tracking task with a fairly simple 
course provides a high degree of “readiness. to respond”’ and thus might well 
involve lower than usual reaction times. 

The utility of this proposed correlational analysis technique is best 
seen by comparing it with other attempts to discover the factors determining 
the operator’s performance [2, 4, 5]. These attempts derived largely from the 
frequency analysis techniques of determining transfer functions in engineering 
practice and involve comparing the amplitude of each input frequency with 
the amplitude of the same frequency and its harmonics at the output. Fre- 
quencies other than these are defined as ‘“‘noise’’ and represent non-linearities 
in the response characteristics which the techniques fail to account for. It 
will be noticed that this approach parallels the present analysis wherein the 
display alone was considered. Frequency analysis techniques and the corre- 
lation techniques both show that the display variables alone do permit a low 
order, approximate prediction of the operator’s performance and therefore 
may have some practical utility. However, the results of the present study 
demonstrate that there is considerable danger of obtaining a badly distorted 
view when only the display variables are used. If one is interested in knowing 
the relative importance of different display variables, then omitting the R 
variables leaves the possibility that the relative weights obtained will be 
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confounded by the influence of the R variables. An additional advantage of 
the correlation technique when the FR variables are included is that it accounts 
for at least some of the sources of non-linearity. Thus, from a strictly practical 
point of ,view, the correlation technique gives a closer approximation to the 
operator’s behavior and might, therefore, provide a closer prediction of how 
the operator will fit into a given system. 

The multiple correlation techniques should also be useful in investigations 
of changes in performance which might occur when different controls are 
employed. What variables determine the response when an acceleration or 
velocity control is used rather than the position control used here? What 
variables determine the response when an aided or quickened control [ef. 1] 
is used? It may be that performance with a position control and with a 
properly aided control are dependent upon the same variables. The correla- 
tional analyses could be used to test this hypothesis. In addition, the technique 
could be used to describe changes in the dependencies with learning and the 
manner in which these dependencies alter when an operator is transferred 
from one type of a control to another. Furthermore, the technique should be 
useful in evaluating or formulating theories of human tracking behavior. 
These and other uses of the present technique could contribute to a clari- 
fication of the nature of different tracking tasks and provide estimations of 
how the human operator fits into different kinds of tracking systems. 
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A MEASURE OF THE GAMBLING RESPONSE-SET IN 
OBJECTIVE TESTS — 


Rosert C. ZILLER 
UNIVERSITY OF DELAWARE 


A formula is developed for measuring the gambling response-set or 
utility for risk in objective tests in which the testees are apprised of the 
application of a correction for guessing. Some implications of this measure 
for test theory and construction are discussed briefly. 


In 1950 Gulliksen [3] suggested that useful scores might be developed 
from the number of items skipped in an objective examination. Moreover, 
he suggested that, ‘Such scores, coupled with careful test directions, may 
indicate ‘cautiousness’ or some similar personality characteristic of the 
subjects.” Actually three earlier studies on this topic had been reported 
[6, 7, 9] and two studies have been reported recently [5, 8]. This paper sum- 
marizes these studies, develops a formula which provides a measure of the 
personality characteristic indicated, and discusses some implications of this 
measure for test theory and construction. 

In 1936 Votaw [9] presented data demonstrating a relationship between 
measures of dominance-submission and guessing behavior on a test in which 
the subjects were directed not to guess. In this report the measure of guessing 
behavior was simply the sum of the unattempted items. 

An improved measure of guessing behavior was reported by Swineford 
in 1938 [6]. Through special directions on an achievement test, subjects were 
required to indicate their confidence in their response to each item by select- 
ing the amount of credit desired (2, 3, or 4 units). Double the credit claimed 
was substracted if the response was incorrect. The “gambling tendency” 
was derived from the following formula: 


Errors marked 4 X 100_. 
Total errors + 4 omissions 





(1) G = 


In this index it was assumed that all errors were guesses. The reported 
reliability of this measure was .796. 

In 1941, Swineford [7] subjected this index to further analysis and 
concluded that ability in the field covered by the test is independent of the 
tendency to gamble, Moreover, the results suggested that familiarity with 
the material has some effect on the tendency to gamble. New types of material 
and tests seemed to encourage gambling or guessing behavior. 

More recently [5] an index of maladjustment derived from selected 
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MMPI items was found to correlate significantly and negatively with guessing 
behavior on an achievement examination in which the use of a correction 
formula was announced. It was revealed that the more maladjusted students 
omitted more items and omitted a larger proportion of items than they were 
capable of answering correctly. 

In a recent investigation, Torrance and Ziller [8] also show the guessing 
behavior on an achievement examination in which the subjects were informed 
of the intended use of a correction formula is correlated with personality 
variables derived from a biographical inventory. A “high risker’’ appears to 
be a self-confident, physically and socially adequate, competitive, self- 
expressive, secure individual who strongly identifies with the masculine role. 
(The subjects were Air Force aircrew members.) 

In general, it must be concluded that in examinations in which a correc- 
tion formula is imposed with the knowledge of the subjects, error variance 
is introduced in the form of a personality trait. This trait may be tentatively 
described as “‘utility for risk.’’ The index predicts decision-making behavior 
in situations where there is a choice between two alternatives, A and B, 
where A offers a greater reward than B in the event of success but also greater 
punishment than B in the event of a loss. The person with a high index may 
be said to have greater psychological resources at his disposal with which to 
tolerate the consequences of misfortune. Thus, adjustment is operationally 
defined in terms of decision-making behavior. 

To date, no index has been presented which provides a measure of risk 
acceptance for any objective examination which employs a correction 
formula. Swineford’s approach was in the desired direction. However, the 
formula was not developed from a clearly defined theoretical framework and 
applies only to a special type of test. As a result, the formula is not sensitive 
to all the possible variances attributable to risk-acceptance. 

For these reasons the following formula was developed:* 


_ [n/n — 1)]W 
@) * = injin — IW + 0” 
where R the index of risk-acceptance, 
n the number of alternatives, 
W = the number of incorrect responses, 
U = the number of items omitted. 
Theoretically, this formula represents a ratio of the number of items 


upon which the subject guessed to the total number of items the subject did 
not know, but upon which he could have guessed. That is: 





_ _(True number of questions guesséd) _ 
(True number of questions not known) 


*The author wishes to acknowledge the assistance of Thornton Roby, Tufts Uni- 
versity, in the development of this index. 





(3) R’ 











er! 
tw 
res 


gu 


(4 


fat 
M 


Tl 
is 

ex 
th 


sil 


th 
in 
te 
an 
an 





ROBERT C. ZILLER 291 


It is assumed that an incorrect response is entirely a chance response, and 
that all the alternatives and items are equally difficult. It should be empha- 
sized, however, that R in (2) is an estimate of R’, and therefore subject to 
chance fluctuations. Thus, when the total number of questions guessed is 
large, R may be expected to be close to R’. However, when the total number 
of questions guessed is small, the numerical difference between R’ and R 
may be considerable; that is, the measure may lack reliability under this 
condition. Furthermore, it should be noted that the general formula can not 
be employed in the case where W = U = 0. 

Thus, the number of guesses may be expressed in terms of the number of 
errors. For example, with reference to a true-false examination or test with 
two-alternative items, the number of errors is only one-half of the chance 
responses, assuming that the remaining half of the responses were correct 
guesses. 


Let G = the number of guesses. 
4) Then W = 3G, 
( G =2W, 

and R = 2W/(2W + U). 


Risk behavior, as measured by the general formula, is necessarily a 
factor in any examination involving an announced correction for guessing. 
Moreover, since it has been demonstrated to be a personality correlate, it 
contributes only to the error variance of achievement examination scores. 
This has been recognized for sometime, even though it is not often considered 
when modified response methods for multiple choice items are suggested 
[1, 2]. However, the measure derived through the risk-acceptance formula 
permits more systematic analysis of problems concerning the effects of 
guessing on test scores. 

From the general formula it is apparent that risk-acceptance is a function 
of the number of items omitted U and the number of items marked incorrectly 
W. Necessarily, U and W are functions of the difficulty of the test items. 
Therefore, test score variance attributable to variance in risk-acceptance 
is a direct function of the difficulty of the test items. As the difficulty of an 
examination approaches .00, that is, when the items cannot be answered on 
the basis of knowledge, understanding, etc., risk-acceptance becomes the 
single correlate of the examination score. 

Thus, under conditions in which a correction for guessing is imposed, 
the element of risk is introduced and leads to increasing error variance with 
increasing test difficulty. However, research to date indicates that increasing 
test difficulty to a given level under constant conditions of spread of difficulty 
and item correlation will also increase test validity [4]. Yet most of the 
analytical investigators summarized in this report employ mathematical 
models which do not consider the condition of risk described here. Empirical 
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studies of this problem are indicated. On the basis of the foregoing discussion, 
it may be predicted that the optimal level of difficulty under the invariant 
conditions of spread of difficulty and item intercorrelation is somewhat less 
under conditions of risk than under conditions in which the test score is simply 
the number of correct responses. 

Aside from the implications for test theory and construction, the formula 
provides a unique and useful measure of a personality trait which may be 
derived easily from any objective test with appropriate directions. As indi- 
cated, however, a very difficult yet seemingly appropriate examination 
provides a more reliable and valid measure of the risk factor. Such an instru- 
ment is, in fact, a situation test of personality. 


REFERENCES 


[1] Coombs, C. H. On the use of objective examinations. Educ. psychol. Measmt, 1953, 13, 
308-310. 

[2] Dressel, P. L. and Schmid, J. Some modifications of the multiple-choice item. Educ. 

psychol. Measmt, 1953, 13, 574-594. 

[3] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 

[4] Loevinger, J. The attenuation paradox in test theory. Psychol. Bull., 1954, 51, 493-504. 

(5) Sherriffs, A. C. and Bloomer, D. V. Who is penalized by the penalty for guessing? 
J. educ. Psychol., 1954, 45, 81-90. 

[6] Swineford, Frances. The measurement of a personality trait. J. educ. Psychol., 1938, 29, 
295-300. 

[7] Swineford, Frances. Analysis of a personality trait. J. educ. Psychol., 1941, 32, 438-444. 

[8] Torrance, E. P. and Ziller, R. C. Risk and life experience development of a scale for 
measuring risk-taking tendencies. Lackland Air Force Base, Texas: Air Force Personnel 
and Training Research Center, February, 1957. (Research Report, AFPTRC-TN-57- 
23, ASTIA, Document No. 07892C.) 

[9] Votaw, D. F. The effect of do-not-guess directions on the validity of true-false or 
multiple choice tests, J. educ. Psychol., 1936, 27, 698-703. 


Manuscript received 1/29/57 
Revised manuscript received 3/15/57 








PSYCHOMETRIKA—VOL. 22, No. 3 
SEPTEMBER, 1957 


THE UPPER AND LOWER TWENTY-SEVEN PER CENT RULE 


EDWARD E. CuRETON 


UNIVERSITY OF TENNESSEE 


A simplified re-derivation of the formula underlying the rule is pre- 
sented, followed by a derivation of the comparable rule for the unit- 
rectangular distribution, which turns out to be a 33-per cent rule. Critical 
comments are offered concerning two assumptions: normality of the score 
distribution and equality of mean standard errors of measurement in the 
high and low groups. 


While the use of upper and lower subgroups each containing 27 per cent 
of the total group is quite common in item analysis, it is interesting to note 
that Kelley’s original proof [1, 2] has not been examined critically, and so 
far as the writer is aware, the derivation does not appear in any textbooks 
on psychological and educational statistics other than Kelley’s ([3], pp. 300- 
301). A simplified derivation is offered herewith, followed by a few critical 
comments. 

Let s be the standard error of measurement of the criterion scores (the 
standard response error of a single score). It will appear later that this 
standard error of measurement may be either that appropriate to raw scores 
or that appropriate to regressed scores without altering the argument. Then 
the standard response error of the mean of all cases in one subgroup will 
be s/V q for q cases in the subgroup. In particular, the unit-normal distri- 
bution contains one case in the total group, and q, the proportion in one 
tail, is fractional without affecting the validity of the previous statement. 

In the unit-normal distribution, the distance from the mean of the 
whole distribution to the mean of one tail is 2/q, z being the unit-normal 
ordinate at the baseline point 2 which separates the tail from the rest of 
the distribution. The distance from the mean of the lower tail to the mean 
of the upper tail is therefore 2 2/q for symmetrical tails, and the standard 
response error of this difference is s+/2/q. The critical ratio is 

22/q V2 2 
(1) CR oo ee oo 
sV2/q 8 vq 
The problem is to maximize CR by choice of x (and hence of q and of z, both 
of which are functionally related to x). The factor +/2/s is assumed not to 
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vary with x (or q or 2), so the problem reduces to maximizing 
(2) f =2/V4q, 


and the formula defining s does not enter. 


df 1 (2 Zz a) 


im dz ~ 9/q \de ~ 2g de! 
Now 
(4) z= (1/V2ne"”: 


—2*/2 





and from (4), 
(5) — = —Z72. 
Also 
q = (1/2) [ eax, 
where X is any baseline value from x to ©. Then at X = 2x, by the funda- 
mental theorem of the calculus, 


dq mei —z?/2 


dx vx j 





and from (4), 


dq _ 


(6) a 


—Z. 


Substituting from (5) and (6) in (3), 
- df 1 ( 2° ) 
a en ee ae A 
@) dx Vg + 2q 
The condition for a maximum, which is obtained by setting the derivative 
equal to zero, is then 2?/2q = xz, or 
(8) z/qx = 2. 


The Kelley-Wood table of the normal probability integral gives values 
of z and x correct to six decimal places for argument q correct to three 
decimals, and we find the following adjacent entries: 


q 3 Z 2/qu__ 
.270 .612,813 . 330 , 646 1.99835 








.271 .609 , 791 . 331,257 2.00454 
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The values of z/gx are computed from the tabled values of q, x, and z. Linear 
interpolation yields g = .270267 at z/qx = 2, and the rounded figure .27 
is actually correct to three decimals. 

The assumption that the distribution of criterion scores is normal is 
well known. Moderate departures from symmetry should have little effect, 
but platykurtosis, which is usually found in distributions of scores on ex- 
perimental tests (when we use the criterion of internal consistency), and 
is to be expected [4], might change the picture appreciably. Consider the 
_ extreme case: the unit-rectangular distribution. This distribution, like the 
unit-normal distribution, is defined as having area = 1 and o = o° = 1. Its 
ordinate z is constant, and its baseline has finite limites X + a, equidistant 
from X = 0. For one-half the distribution, the area (the total frequency) is 


[ eax, 


and the second moment, corresponding to (1/2) =(X — X)*/N in a discrete 
distribution, is 


} Xz dX. 
0 
The standard deviation is therefore 


2 - Xz dX 


gc=o = —*__.. 
af 2aXx 


Since z is a constant, 





c=s = : es, 2 
dX 
0 
Hence o = o° = a’/3a = a’/3, and since ¢ = o° = 1, a’ = 3, and 
(9) a= V3. 
Also, since the area is unity, az = 1/2, and 
(10) wie pa 
2V3 


The distance from the mean of the whole distribution to the mean of one 
tail is (x + a)/2, where 2 is again the baseline point separating the tail from 
the rest of the distribution, and the distance between the means of the two 
symmetrical tails is x + a. This distance has standard response error s /2/q 
as before, and gq = (a — x)z. The critical ratio is then 


(11) CR = (x +a) Va — »2 /sV2, 
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and since both s and z are constants, the function to be maximized is 


(12) f=(@+a) Va-xz. 


“ = -(¢+a)/2Va—2+ Va—z, 


and setting this derivative equal to zero, 
(13) z= a/3. 


Then from (9), x = 1/+/3, and since g = (a — 2)z, and from (10), 
z2=1/2 V3, 

(14) q = (V3 — 1/V3)(1/2-V8) = 1/8. 

For moderately platykurtic distributions of criterion scores, therefore, the 
subgroups should probably consist of the upper and lower 29 or 30 per cent 
of the total group. 

More serious, probably, than the assumption of normality is the assump- 
tiou that the standard error of measurement does not vary with score level, 
which in this case reduces to the assumption that the mean value of s is the 
same in the two symmetrical tails. When we use the criterion of internal 
consistency, with an experimental test made up of recognition-type items 
rather than recall or free-answer items, it is well known that moderately easy 
items tend to exhibit higher internal consistency than do moderately hard 
items. It is fairly reasonable to assume that the error variance is about the 
same in the low group for easy items as in the high group for hard items. But 
the easy items will contribute little additional error variance in the high 
group, since almost everyone in that group will know the answers to most 
of them and will not have to guess, while the hard items will contribute 
much additional error variance in the low group. It is probable, therefore, 
that the mean value of s will be greater in the low group than in the high 
group. In this case the maximum CR will probably require more cases in the 
low group than in the high group, but how many more is a problem requiring 
further investigation. 
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BOOK REVIEWS 


J. Raymonp GerBericu. Specimen Objective Test Items—A Guide to Achievement Test 
Construction. New York: Longmans, Green and Co., 1956. Pp. ix + 436. 


This book presents a new approach to the problem of helping class-room teachers 
and other test constructors improve their tests. It concentrates primarily on item writing, 
and seeks to aid prospective item writers by providing models—a wide selection of forms, 
types, and varieties of objectives test items drawn from tests in all subject matter fields, 
and at all educational levels. This basic emphasis on the importance of item writing, and 
this procedure of ‘presenting a variety of examples are both admirable. Many teachers, 
unfortunately, resort to testing for trivia whenever they use objective test items. The 
primary function of this book is to guide them in the development of items which measure 
some of the more important outcomes of education. 

The book is organized in four parts. Part I devotes 19 pages to a brief outline of 
procedures in objective achievement test construction. Part II, in 195 pages, presents 227 
excerpts from published tests. These excerpts are organized around ten important out- 
comes of instruction such as skills, knowledges, appreciations, attitudes, and adjustments. 
Part III, in 137 pages, presents a variety of schemes for classifying these sample items— 
by form, type, and variety and by school subject and level. The items are also cross classified 
in two-dimension tables by type and learning outcome, by form and pupil stimulus, and 
by pupil activity and form. Part IV, in 24 pages, considers briefly a variety of special test 
techniques such as oral examinations, essay examinations, performance tests, and non- 
test tools and techniques. 

A striking and valuable feature of the book is the extensive and well-organized lists 
of references, which occupy approximately 80 pages and include over 1,000 references to 
journal articles and books. There is also a glossary of over 300 terms, emphasizing primarily 
those terms used to describe various forms of objective test items. 

One of the major problems faced by any compiler of sample test items is the develop- 
ment of a workable system for organizing the items. An ideal system would minimize the 
indeterminacy of classification and maximize the ease of locating any desired specimen. 
Professor Gerberich has wisely chosen to emphasize major learning outcomes as the 
primary basis for classifying these sample items. But some of the terms used to identify 
these outcomes, terms which teachers use frequently with apparent understanding, seem 
to mean quite different things to different people. To deal with this problem the author 
begins each chapter on a particular learning outcome by citing one or more authoritative 
definitions of that outcome. While these definitions shed some light on the meaning of the 
term, they seldom define it in the sense of setting up precise limits to its meaning. They 
seldom provide criteria which can be used to classify a given test item definitely as measuring 
one particular learning outcome rather than some other. 

The net result is that one finds items which seem almost identical in the task they 
present to the examinee classified under entirely different learning outcomes. For example, 
excerpts 4 and 107 both require the examinee to interpret shorthand symbols, but the 
first is classified as a measure of skill, while the second is classified as a measure of under- 
standing. Excerpts 1 and 69 both require the examinee to distinguish complete sentences 
from incomplete sentences, yet 1 is classified with measures of skills, whereas 69 is classified 
with measures of concepts. Excerpts 7 and 159 both require the examinee to indicate what 
punctuation is needed in a sentence, but excerpt 7 is classified with the skill items while 
excerpt 159 is classified with items measuring applications. 

The difficulty here seems to lie chiefly with the vagueness of the category concepts. 
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Perhaps it would be fruitful to try building a system for organizing sample test items, not 
on the basis of these conventional labels for supposed learning outcomes, but rather on the 
basis of the tasks they present to the examinee. One might then arrive at a set of categories 
bearing descriptive labels like ‘‘interpretation of symbols,”’ “knowledge of word meanings,”’ 
“Tecall of factual information,” “ability to solve numerical problems,” and ‘ability to 
make correct decisions in practical problem situations.’ These might provide a scheme for 
a somewhat more clearly determinate classification of test items. 

Part III of the book extends the usefulness of the collection of test excerpts by 
presenting a variety of cross-classifications of them. In these classifications considerable 
attention is devoted to differences in item form, type, and variety. Multiple choice items 
are classified separately from alternate response items. Five-response items are treated 
separately from four-response items. Multiple choice items in which the responses are 
words are distinguished from those in which the responses are phrases, or numbers, or 
symbols. The result is that a great deal more attention appears to be devoted to matters 
of formal difference than to matters of difference in content. 

Perhaps this emphasis on form in Part III was deliberately intended, since Part II 
was organized primarily on the basis of item function. But one has the impression that 
this collection of sample items does not quite do justice to the varied capabilities of the 
more conventional multiple choice items widely used in classroom tests and in the better 
modern standardized tests of achievement. In this collection of 227 sample items there 
appear to be only eight straightforward, unelaborated multiple choice type items. Further, 
there appears to be no example of a multiple choice item consisting of a direct question 
stem, followed by four complete statements as responses. Yet within this form alone it is 
possible to measure such diverse educational attainments as knowledge of word meanings, 
of facts, of laws and principles, and of explanations, and ability to solve problems or make 
practical decisions. 

What has just been said may suggest that if the reviewer had written this book he 
would have written it somewhat differently. This is true. But the aim and method of the 
book would have been essentially the same. And, one should not forget, Professor Gerberich 
has actually produced a book while the reviewer has not. It is a good book, one we have 
needed very much. I commend it to the classroom teachers for whom it was primarily 
written, and also to test specialists. They too will find it a valuable reference. 


Towa City, Iowa Robert L. Ebel 


Sipney Srecet. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw- 
Hill Book Company, 1956. Pp. xvii + 312. 


First events are always difficult to judge since there is no established frame of refer- 
ence. So it is with Siegel’s Nonparametric Statistics, which attempts to pull together for the 
first time the more important distribution-free statistics into a single volume. As a book 
on statistics, it is a model of organization. It is easy to follow and logical in development. 
The author takes the reader through logical steps from expressing a null hypothesis and its 
alternative to the final step of making a decision based on the result. I cannot recall ever 
seeing a book on statistics, for the researcher, as systematic as this one. Its faults are by 
and large sins of omission rather than of commission. 

I said that Siegel purports to pull together the more important distribution-free 
statistics along with the tables necessary for determining their significances. He claims 
an author’s prerogative to choose as he pleases, and in general he has covered the area 
quite nicely. However, it seems to this reviewer astonishing that Marshall’s test, Tukey’s 
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corner test, the Mood-Brown distribution-free analysis of variance tests, all nonparametric 
tests of interaction (except by accident), Wilcoxon’s T-test for groups of unreplicated 
samples, nonparametric tests of trend, and multi-dimensional x? are omitted. Siegel does 
have a footnote to the effect that K. V. Wilson’s article on multi-dimensional x? came out 
after the book was written, but this seems a poor excuse since there are discussions of this 
problem in such places as Mood and Rao as well as others, and it is a natural generalization 
of simple x? problems. Hypotheses about interaction occur often in psychological research; 
it seems extremely important to include some of the nonparametric tests of interaction. It 
also seems important that extensions to more complex designs be given, and this involves 
x? as the principle tool. 

It is also surprising that the use of nonparametrics in estimating population param- 
eters, e.g., rank percentile levels, was not included. This omission could be justified easily 
on the basis of space requirements, but such an omission of estimation procedures does 
“weaken” the usefulness of the book as a textbook. 

There are at least three points in the book which can mislead people. First, the 
Mann-Whitney U-test is only a test of location when the two samples being studied come 
from populations with identical shapes. In fact, this test is a test of rapidity of build-up 
from a specified direction. If one population is extremely skewed and the other not, it is 
possible to obtain a significant U though in fact the two populations do not differ in location. 
The only secondary source which makes this point clear is Keith Smith’s chapter in Festinger 
and Katz. 

Second, it is easy to overinterpret the author’s advocacy of nonparametric statistics 
over the parametric type, in spite of some warnings he gives. (This has been borne out in 
several discussions this reviewer has had with several researchers who have been using this 
book.) That is, nowhere does the author mention transformations which normalize non- 
normal distributions. Since parametric statistics are always more powerful than non- 
parametric statistics if the distributions are normal, it would seem wise to point out the 
possibilities of such transformations. For example, even though one has a small sample of 
subjects in a given experiment dealing with reaction time, he should know—from the 
accumulated research of others—that a logarithmic transformation will normalize the 
distribution of his scores and is to be preferred over a less powerful nonparametric statistic. 
Ine gets the impression that one always has to estimate the shape of the population 
distribution of scores from one’s own sample, which in fact may be quite small. There are 
many instances in psychological and social science research where there is a considerable 
backlog of information concerning score distributions and such information should be 
utilized in making a decision as to whether to transform (normalize) the distribution of 
scores and use parametric tests or whether to use only nonparametrics. 

The third point also stems from Siegel’s enthusiasm for nonparametric statistics. 
It concerns his strictures concerning scales of measurements. There are many statisticians 
who do not worry about whether they are using an ordinal or an equal interval or a ratio 
type scale so long as the distributions are approximately normal in shape. As yet non- 
parametrics are extremely limited in their application; they do not exist for most multi- | 
variate problems. It is easy to be misled by Siegel’s discussions of scales of measurements 
into being too cautious. Frankly, we do not know as yet what effect the type of scale has 
on parametric statistics, if any. Most empirical investigations show little or no effect. 
The reader needs to be cautious about indiscriminately abandoning parametric statistics 
solely on the basis of scales of measurement. 

In summary, this book is an excellently organized presentation with many valuable 
statistics which can be profitably used. Though many worthwhile distribution-free tests 
are omitted, its virtues far outweigh its limitations and I can only say “you ought to 
have it; it is well worth while.” 


The Menninger Foundation Charles M. Solley 
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Boss, R. C., Ctatwortruy, W. H., and Surrxanps, S. S. Tables of Partially Balanced 
Designs with Two Associate Classes. Raleigh, N. C.: Institute of Statistics of the 
Consolidated University of North Carolina; N. C. Agricultural Experiment Station 
Tech. Bull. No. 107, 1954. Pp. iv + 255. 


Binet, F. E., Leste, R. T., Werner, S., and ANpERsoN, R. L. Analysis of Confounded 
Factorial Experiments in Single Replications. Raleigh, N. C.: Institute of Statistics 
of the Consolidated University of North Carolina; N. C. Agricultural Experiment 
Station Tech. Bull. No. 113, 1955. Pp. 64. 


The first monograph under review (Bose et al.) is devoted to the partially balanced 
incomplete block designs. These are members of a class of designs in which each experi- 
mental unit (block) receives some but not all of the treatments. They differ from the 
balanced designs in that pairs of treatments do not appear equally often in the same block. 
While there are computational advantages in favor of the balanced arrangements and 
while the balanced arrangements have the useful characteristic of allowing all treatment 
comparisons to be made with the same accuracy, the partially balanced designs offer the 
experimenter more variety and greater freedom in planning so far as numbers of blocks, 
treatments, replications, etc., are concerned. These designs should be valuable in studies 
where a number of experimental treatments are to be used with human (or animal) sub- 
jects but there is not enough time (or money, or judges, or apparatus) to expose all of the 
subjects to all of the treatments; or where the size of the experimental unit necessarily 
precludes a complete set of treatments (as in animal studies using litter controls). Bose 
and his colleagues present in their monograph the most complete discussion of these 
designs to be found in the literature. After an introductory section of 88 pages (including 
worked examples of each of the major types) the bulk of the book is given over to a cata- 
logue of some 375 designs indexed for easy reference. For each design the experimental 
plan is given, and the efficiency factors (relative to randomized blocks) for each type of 
treatment comparison and for the over-all design are listed. 

The second work (Binet et al.) is a description of methods of analysis for confounded 
factorials. These are factorial designs in which information on certain effects (usually 
interactions) is sacrificed because the experimenter has little or no interest in them or 
because he knows in advance that they are of little or no significance. The major contri- 
bution of this work is the development of a technique due to Yates and Bainbridge for 
computing individual treatment contrasts when the variables may have as many as seven 
levels. The several levels of each factor are assumed to be ordered at equal intervals along 
a continuous scale, and high-degree components of interaction are used as error estimates. 
An introductory discussion is given, including a worked example, followed by a number of 
experimental plans. The reviewer doubts the usefulness of the confounded factorial for 
psychology, particularly when the analysis is pursued as far as the isolation and manipula- 
tion of individual degrees of freedom. In most areas we need to learn considerably more 
about interaction (when to expect it, how to interpret it), about quantitative response 
laws in given experimental situations, about scale-unit effects, etc. Even in fields so in- 
tensively cultivated as learning and perception experimental procedures and measurement 
techniques are so far from standardized that an investigator frequently has no idea what 
his results will look like beyond certain gross trends or certain significant main-effects mean 
squares. He is rarely prepared to interpret or to rationalize all the possible orthogonal 
components. Psychologists who have learned through considerable experience what to 
expect from their techniques and their subjects, and who are interested in problems and 
hypotheses for which these methods are appropriate, however, will find this publication a 
useful guide. 


Washington, D. C. SaMvuEL B. LYERLy 





